Re: hash_search_with_hash_value is high in "perf top" on a replica

From: Andres Freund <andres(at)anarazel(dot)de>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_search_with_hash_value is high in "perf top" on a replica
Date: 2025-02-01 15:55:38
Message-ID: ii7jyq47owdsce5bwelxusaknc7fqsnhuljnx4vfafu5ob7sy6@akljdxwtnhiw
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-02-01 15:43:41 +0100, Ants Aasma wrote:
> On Fri, Jan 31, 2025, 15:43 Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > > Maybe it's a red herring though, but it looks pretty suspicious.
> >
> > It's unfortunately not too surprising - our buffer mapping table is a
> > pretty
> > big bottleneck. Both because a hash table is just not a good fit for the
> > buffer mapping table due to the lack of locality and because dynahash is
> > really poor hash table implementation.
> >
>
> I measured similar things when looking at apply throughput recently. For
> in-cache workloads buffer lookup and locking was about half of the load.
>
> One other direction is to extract more memory concurrency. Prefetcher could
> batch multiple lookups together so CPU OoO execution has a chance to fire
> off multiple memory accesses at the same time.

I think at the moment we have a *hilariously* cache-inefficient buffer lookup,
that's the first thing to address. A hash table for buffer mapping lookups imo
is a bad idea, due to loosing all locality in a workload that exhibits a *lot*
of locality. But furthermore, dynahash.c is very far from a cache efficient
hashtable implementation.

The other aspect is that in many workloads we'll look up a small set of
buffers over and over, which a) wastes cycles b) wastes cache space for stuff
that could be elided much more efficiently.

We also do a lot of hash lookups for smgr, because we don't have any
cross-record caching infrastructure for that.

> The other direction is to split off WAL decoding, buffer lookup and maybe
> even pinning to a separate process from the main redo loop.

Maybe, but I think we're rather far away from those things being the most
productive thing to tackle.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2025-02-01 16:00:01 Re: Improving tracking/processing of buildfarm test failures
Previous Message Andres Freund 2025-02-01 15:50:37 Re: hash_search_with_hash_value is high in "perf top" on a replica