From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> |
Cc: | Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: hash_search_with_hash_value is high in "perf top" on a replica |
Date: | 2025-02-01 15:55:38 |
Message-ID: | ii7jyq47owdsce5bwelxusaknc7fqsnhuljnx4vfafu5ob7sy6@akljdxwtnhiw |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-02-01 15:43:41 +0100, Ants Aasma wrote:
> On Fri, Jan 31, 2025, 15:43 Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > > Maybe it's a red herring though, but it looks pretty suspicious.
> >
> > It's unfortunately not too surprising - our buffer mapping table is a
> > pretty
> > big bottleneck. Both because a hash table is just not a good fit for the
> > buffer mapping table due to the lack of locality and because dynahash is
> > really poor hash table implementation.
> >
>
> I measured similar things when looking at apply throughput recently. For
> in-cache workloads buffer lookup and locking was about half of the load.
>
> One other direction is to extract more memory concurrency. Prefetcher could
> batch multiple lookups together so CPU OoO execution has a chance to fire
> off multiple memory accesses at the same time.
I think at the moment we have a *hilariously* cache-inefficient buffer lookup,
that's the first thing to address. A hash table for buffer mapping lookups imo
is a bad idea, due to loosing all locality in a workload that exhibits a *lot*
of locality. But furthermore, dynahash.c is very far from a cache efficient
hashtable implementation.
The other aspect is that in many workloads we'll look up a small set of
buffers over and over, which a) wastes cycles b) wastes cache space for stuff
that could be elided much more efficiently.
We also do a lot of hash lookups for smgr, because we don't have any
cross-record caching infrastructure for that.
> The other direction is to split off WAL decoding, buffer lookup and maybe
> even pinning to a separate process from the main redo loop.
Maybe, but I think we're rather far away from those things being the most
productive thing to tackle.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Lakhin | 2025-02-01 16:00:01 | Re: Improving tracking/processing of buildfarm test failures |
Previous Message | Andres Freund | 2025-02-01 15:50:37 | Re: hash_search_with_hash_value is high in "perf top" on a replica |