Quick Links

Re: [PoC] Improve dead tuple storage for lazy vacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	John Naylor <johncnaylorls(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject:	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date:	2024-03-04 06:05:12
Message-ID:	CAD21AoAnTo4SWrdjXA7xWuEmGtAKG=M1-hB+jjhJGQdMLEDjGA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 3, 2024 at 2:43 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Fri, Mar 1, 2024 at 3:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 29, 2024 at 8:43 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > > + ts->rt_context = AllocSetContextCreate(CurrentMemoryContext,
> > > + "tidstore storage",
> > >
> > > "tidstore storage" sounds a bit strange -- maybe look at some other
> > > context names for ideas.
> >
> > Agreed. How about "tidstore's radix tree"?
>
> That might be okay. I'm now thinking "TID storage". On that note, one
> improvement needed when we polish tidstore.c is to make sure it's
> spelled "TID" in comments, like other files do already.
>
> > > - leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx, allocsize);
> > > + leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx != NULL
> > > + ? tree->leaf_ctx
> > > + : tree->context, allocsize);
> > >
> > > Instead of branching here, can we copy "context" to "leaf_ctx" when
> > > necessary (those names should look more like eachother, btw)? I think
> > > that means anything not covered by this case:
> > >
> > > +#ifndef RT_VARLEN_VALUE_SIZE
> > > + if (sizeof(RT_VALUE_TYPE) > sizeof(RT_PTR_ALLOC))
> > > + tree->leaf_ctx = SlabContextCreate(ctx,
> > > + RT_STR(RT_PREFIX) "radix_tree leaf contex",
> > > + RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> > > + sizeof(RT_VALUE_TYPE));
> > > +#endif /* !RT_VARLEN_VALUE_SIZE */
> > >
> > > ...also, we should document why we're using slab here. On that, I
> > > don't recall why we are? We've never had a fixed-length type test case
> > > on 64-bit, so it wasn't because it won through benchmarking. It seems
> > > a hold-over from the days of "multi-value leaves". Is it to avoid the
> > > possibility of space wastage with non-power-of-two size types?
> >
> > Yes, it matches my understanding.
>
> There are two issues quoted here, so not sure if you mean both or only
> the last one...

I meant only the last one.

>
> For the latter, I'm not sure it makes sense to have code and #ifdef's
> to force slab for large-enough fixed-length values just because we
> can. There may never be such a use-case anyway. I'm also not against
> it, either, but it seems like a premature optimization.

Reading the old threads, the fact that using a slab context for leaves
originally came from Andres's prototype patch, was to avoid rounding
up the bytes to a power of 2 number by aset.c. It makes sense to me to
use a slab context for this case. To measure the effect of using a
slab, I've updated bench_radix_tree so it uses a large fixed-length
value. The struct I used is:

typedef struct mytype
{
uint64 a;
uint64 b;
uint64 c;
uint64 d;
char e[100];
} mytype;

The struct size is 136 bytes with padding, just above a power-of-2.
The simple benchmark test showed using a slab context for leaves is
more space efficient. The results are:

slab:
= #select * from bench_load_random_int(1000000);
mem_allocated | load_ms
---------------+---------
405643264 | 560
(1 row)

aset:
=# select * from bench_load_random_int(1000000);
mem_allocated | load_ms
---------------+---------
527777792 | 576
(1 row)

>
> > > For this stanza that remains unchanged:
> > >
> > > for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++)
> > > {
> > > MemoryContextDelete(tree->node_slabs[i]);
> > > }
> > >
> > > if (tree->leaf_ctx)
> > > {
> > > MemoryContextDelete(tree->leaf_ctx);
> > > }
> > >
> > > ...is there a reason we can't just delete tree->ctx, and let that
> > > recursively delete child contexts?
> >
> > I thought that considering the RT_CREATE doesn't create its own memory
> > context but just uses the passed context, it might be a bit unusable
> > to delete the passed context in the radix tree code. For example, if a
> > caller creates a radix tree (or tidstore) on a memory context and
> > wants to recreate it again and again, he also needs to re-create the
> > memory context together. It might be okay if we leave comments on
> > RT_CREATE as a side effect, though. This is the same reason why we
> > don't destroy tree->dsa in RT_FREE(). And, as for RT_FREE_RECURSE(),
>
> Right, I should have said "reset". Resetting a context will delete
> it's children as well, and seems like it should work to reset the tree
> context, and we don't have to know whether that context actually
> contains leaves at all. That should allow copying "tree context" to
> "leaf context" in the case where we have no special context for
> leaves.

Resetting the tree->context seems to work. But I think we should note
for callers that the dsa_area passed to RT_CREATE should be created in
a different context than the context passed to RT_CREATE because
otherwise RT_FREE() will also free the dsa_area. For example, the
following code in test_radixtree.c will no longer work:

dsa = dsa_create(tranche_id);
radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id);
:
rt_free(radixtree);
dsa_detach(dsa); // dsa is already freed.

So I think that a practical usage of the radix tree will be that the
caller creates a memory context for a radix tree and passes it to
RT_CREATE().

I've attached an update patch set:

- 0008 updates RT_FREE_RECURSE().
- 0009 patch is an updated version of cleanup radix tree memory handling.
- 0010 updates comments in tidstore.c such as replacing "Tid" with "TID".
- 0011 rename TidStore to TIDSTORE all places.
- 0012 update bench_radix_tree so it uses a (possibly large) struct
instead of uint64.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v65-ART.tar.gz	application/x-gzip	67.4 KB

In response to

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2024-03-03 05:43:05 from John Naylor

Responses

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2024-03-04 11:48:21 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2024-03-04 06:05:15	Re: Schema variables - new implementation for Postgres 15
Previous Message	Andrey M. Borodin	2024-03-04 05:49:48	Re: Permute underscore separated components of columns before fuzzy matching