Quick Links

RFC: Packing the buffer lookup table

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	RFC: Packing the buffer lookup table
Date:	2025-01-30 07:48:56
Message-ID:	CAEze2WgYoUu7mYiUN0_+VdaXx82gG4BfdcDRaTaQ2fjeVWt4kw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Some time ago I noticed that every buffer table entry is quite large at 40
bytes (+8): 16 bytes of HASHELEMENT header (of which the last 4 bytes are
padding), 20 bytes of BufferTag, and 4 bytes for the offset into the shared
buffers array, with generally 8 more bytes used for the bucket pointers.
(32-bit systems: 32 (+4) bytes)

Does anyone know why we must have the buffer tag in the buffer table?
It seems to me we can follow the offset pointer into the shared BufferDesc
array whenever we find out we need to compare the tags (as opposed to just
the hash, which is stored and present in HASHELEMENT). If we decide to just
follow the pointer, we can immediately shave 16 bytes (40%) off the lookup
table's per-element size, or 24 if we pack the 4-byte shared buffer offset
into the unused bytes in HASHELEMENT, reducing the memory usage of that
hash table by ~50%: We'd have 16 bytes for every
ELEMENT+shared_buffer_offset, plus 8 bytes for every bucket pointer (of
which there are approximately as many as there are elements), resulting in
24 bytes /max_alloc elements.

(This was also discussed on Discord in the Hackers Mentoring server, over
at [0])

Together that results in the following prototype patchset. 0001 adds the
ability for dynahash users to opt in to using the 4-byte alignment hole in
HASHELEMENT (by providing size- and alignment info that dynahash uses to
partially move the entry into the alignment hole), 0002 uses that feature
to get the per-element size of the buffer lookup hash table down to 16
bytes (+8B for bucket pointers), or 12 (+4) on 32-bit systems

An alternative approach to current patch 1 (which introduces "element data
offset" to determine where to start looking for the key) would be to add an
option to allow "0-length" keys/entries when there is alignment space, and
make the hash/compare functions handle writing/reading of key data (thus
removing the new data dependencies in the hash lookup function), but I'm
not sure that's a winning idea as that requires the user of the API to have
knowledge about the internals of dynahash, rather than dynahash internally
optimizing usage based on a clearer picture of what the hash entry needs.

Does anyone have an idea on how to best benchmark this kind of patch, apart
from "running pgbench"? Other ideas on how to improve this? Specific
concerns?

Kind regards,

Matthias van de Meent

[0] https://discord.com/channels/1258108670710124574/1318997914777026580

Attachment	Content-Type	Size
v0-0002-Buftable-Reduce-size-of-buffer-table-entries-by-6.patch	application/x-patch	7.5 KB
v0-0001-Dynahash-Allow-improved-packing-of-hash-elements.patch	application/x-patch	12.6 KB

Responses

Re: RFC: Packing the buffer lookup table at 2025-01-31 17:22:50 from James Hunter
Re: RFC: Packing the buffer lookup table at 2025-02-01 05:01:31 from Zhang Mingli
Re: RFC: Packing the buffer lookup table at 2025-02-04 18:58:36 from Matthias van de Meent
Re: RFC: Packing the buffer lookup table at 2025-02-05 01:14:17 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nazir Bilal Yavuz	2025-01-30 08:02:15	Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message	Michael Paquier	2025-01-30 07:37:46	Re: Show WAL write and fsync stats in pg_stat_io