From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Zhang Mingli <zmlpostgres(at)gmail(dot)com> |
Subject: | Re: RFC: Packing the buffer lookup table |
Date: | 2025-02-04 18:58:36 |
Message-ID: | CAEze2WiRo4Zu71jwxYmqjq6XK814Avf2-kytaL6n=BreZR2ZbA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 30 Jan 2025 at 08:48, Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> Together that results in the following prototype patchset.
Here's an alternative patch, which replaces dynahash in the buffer
lookup table with an open-coded replacement that has fewer
indirections during lookups, and allows for a much increased locality.
Overall, it has less memory usage than our current dynahash in all
cases; but compared to my previous patch it improves memory only in
some cases, in others it uses more memory.
The main design is a separate chaining hash table (similar to and
derived from code of Dynahash) to provide efficient access to free
elements. The innovation over Dynahash is that in this new hash table
the stored elements are slotted between the bucket's element pointers,
thus allowing access to hash elements without further cache misses if
the element that stores the bucket's data is stored on the same cache
line (very possible). Further, the implementation uses int-based
offset pointers, reducing pointer size overhead on 64-bit system and
limiting the hash table to 2^31 elements (for a buffer mapping table
that seems to be fine, given our current buffer count limit of
2^30-1).
Finally, it shows up with accurate memory usage in
pg_shmem_allocations due to the usage of a single allocation,
improving user debugability of resources vs dynahash.
0001 implements this hash table, with some minimal optimizations.
0002 adds the optimization where the hash table prefers to pick the
local hash element if it's not already in use.
Future optimizations could implement memory prefetching (so that
buffer tag compares won't have cache misses for memory stalls),
cacheline-local entry selection, and other optimizations, but that'll
need more effort which I don't want to put into it before getting a
green light on "this might actually be a good step in the right
direction".
As to why simplehash was not chosen: Simplehash doesn't (can't?)
support partitioning due to its open addressing model. By taking the
locality of open addressing and combining it with the partitioning
capabilities of separate chaining we get the best of both worlds (or
at least closer than dynahash got).
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
PS. This isn't a perfect solution; it still causes approximately
random memory accesses when we look up sequential pages of a relation.
However, the cost of these memory accesses can now be significantly
reduced without breaking APIs or significantly impacting memory
overheads. Replacing the hashtable with another seems fairly
straightforward, while replacing the hashtable with e.g. radixtree.h
would be a very significant undertaking of shared memory allocations
that I'm not comfortable with writing myself.
Attachment | Content-Type | Size |
---|---|---|
v1-0001-BufTable-Use-optimized-hash-table-for-buffer-look.patch | application/octet-stream | 15.4 KB |
v1-0002-BufTable-Optimize-allocation-of-BufTable-entries.patch | application/octet-stream | 8.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2025-02-04 19:03:10 | Re: hash_search_with_hash_value is high in "perf top" on a replica |
Previous Message | Alvaro Herrera | 2025-02-04 18:41:12 | Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints |