Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-07-05 08:09:23
Message-ID: 20220705080923.pvhbao6usnpa2i7b@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-07-05 16:33:17 +0900, Masahiko Sawada wrote:
> On Tue, Jul 5, 2022 at 6:18 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> A datum value is convenient to represent both a pointer and a value so
> I used it to avoid defining node types for inner and leaf nodes
> separately.

I'm not convinced that's a good goal. I think we're going to want to have
different key and value types, and trying to unify leaf and inner nodes is
going to make that impossible.

Consider e.g. using it for something like a buffer mapping table - your key
might be way too wide to fit it sensibly into 64bit.

> Since a datum could be 4 bytes or 8 bytes depending it might not be good for
> some platforms.

Right - thats another good reason why it's problematic. A lot of key types
aren't going to be 4/8 bytes dependent on 32/64bit, but either / or.

> > > +void
> > > +radix_tree_insert(radix_tree *tree, uint64 key, Datum val, bool *found_p)
> > > +{
> > > + int shift;
> > > + bool replaced;
> > > + radix_tree_node *node;
> > > + radix_tree_node *parent = tree->root;
> > > +
> > > + /* Empty tree, create the root */
> > > + if (!tree->root)
> > > + radix_tree_new_root(tree, key, val);
> > > +
> > > + /* Extend the tree if necessary */
> > > + if (key > tree->max_val)
> > > + radix_tree_extend(tree, key);
> >
> > FWIW, the reason I used separate functions for these in the prototype is that
> > it turns out to generate a lot better code, because it allows non-inlined
> > function calls to be sibling calls - thereby avoiding the need for a dedicated
> > stack frame. That's not possible once you need a palloc or such, so splitting
> > off those call paths into dedicated functions is useful.
>
> Thank you for the info. How much does using sibling call optimization
> help the performance in this case? I think that these two cases are
> used only a limited number of times: inserting the first key and
> extending the tree.

It's not that it helps in the cases moved into separate functions - it's that
not having that code in the "normal" paths keeps the normal path faster.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-05 08:11:26 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Noah Misch 2022-07-05 07:51:39 Re: typos