Quick Links

Re: [PoC] Improve dead tuple storage for lazy vacuum

From:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date:	2022-10-05 09:40:31
Message-ID:	CAFBsxsETfmYHmXw7mvb-hUC3fJHNR8DhTHizOdt7BEzAzbcMgw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 5, 2022 at 1:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Wed, Sep 28, 2022 at 12:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> >
> > On Fri, Sep 23, 2022 at 12:11 AM John Naylor
> > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > Yeah, node31 and node256 are bloated. We probably could use slab for
> > node256 independently. It's worth trying a benchmark to see how it
> > affects the performance and the tree size.

This wasn't the focus of your current email, but while experimenting with
v6 I had another thought about local allocation: If we use the default slab
block size of 8192 bytes, then only 3 chunks of size 2088 can fit, right?
If so, since aset and DSA also waste at least a few hundred bytes, we could
store a useless 256-byte slot array within node256. That way, node128 and
node256 share the same start of pointers/values array, so there would be
one less branch for getting that address. In v6, rt_node_get_values and
rt_node_get_children are not inlined (asde: gcc uses a jump table for 5
kinds but not for 4), but possibly should be, and the smaller the better.

> Regarding DSA support, IIUC we need to use dsa_pointer in inner nodes
> to point to its child nodes, instead of C pointers (ig, backend-local
> address). I'm thinking of a straightforward approach as the first
> step; inner nodes have a union of rt_node* and dsa_pointer and we
> choose either one based on whether the radix tree is shared or not. We
> allocate and free the shared memory for individual nodes by
> dsa_allocate() and dsa_free(), respectively. Therefore we need to get
> a C pointer from dsa_pointer by using dsa_get_address() while
> descending the tree. I'm a bit concerned that calling
> dsa_get_address() for every descent could be performance overhead but
> I'm going to measure it anyway.

Are dsa pointers aligned the same as pointers to locally allocated memory?
Meaning, is the offset portion always a multiple of 4 (or 8)? It seems that
way from a glance, but I can't say for sure. If the lower 2 bits of a DSA
pointer are never set, we can tag them the same way as a regular pointer.
That same technique could help hide the latency of converting the pointer,
by the same way it would hide the latency of loading parts of a node into
CPU registers.

One concern is, handling both local and dsa cases in the same code requires
more (predictable) branches and reduces code density. That might be a
reason in favor of templating to handle each case in its own translation
unit. But that might be overkill.
--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2022-10-05 06:45:38 from Masahiko Sawada

Responses

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2022-10-06 07:52:26 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2022-10-05 09:41:43	Re: logical replication restrictions
Previous Message	kuroda.hayato@fujitsu.com	2022-10-05 09:27:07	RE: [Proposal] Add foreign-server health checks infrastructure