Re: Change GUC hashtable to use simplehash?

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2023-12-12 05:22:38
Message-ID: CANWCAZYGzFrLHWoWUQ3-d4WfjECXa=02ya_htGbfz_M_qNtA=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 10, 2023 at 2:18 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Sat, 2023-12-09 at 18:52 +0700, John Naylor wrote:
> > > I tested using the new hash function APIs for my search path cache,
> > > and
> > > there's a significant speedup for cases not benefiting from
> > > a86c61c9ee.
> > > It's enough that we almost don't need a86c61c9ee. So a definite +1
> > > to
> > > the new APIs.
> >
> > Do you have a new test?
>
> Still using the same basic test here:
>
> https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com
>
> What I did is:
>
> a. add your v5 patches
> b. disable optimization in a86c61c9ee
> c. add attached patch to use new hash APIs
>
> I got a slowdown between (a) and (b), and then (c) closed the gap about
> halfway. It started to get close to test noise at that point -- I could
> get some better numbers out of it if it's helpful.

I tried my variant of the same test [1] (but only 20 seconds per run),
which uses pgbench to take the average of a few dozen runs, and
doesn't use table I/O (when doing that, it's best to pre-warm the
buffers to reduce noise).

pgbench -n -T 20 -f bench.sql -M prepared
(done three times and take the median, with turbo off)

* master at 457428d9e99b6b from Dec 4:
latency average = 571.413 ms

* v8 (bytewise hash):
latency average = 588.942 ms

This regression is a bit surprising, since there is no strlen call,
and it uses roleid as a seed without a round of mixing (not sure if we
should do that, but just trying to verify results).

* v8 with chunked interface:
latency average = 555.688 ms

This starts to improve things for me.

* v8 with chunked, and return lower 32 bits of full 64-bit hash:
latency average = 556.324 ms

This is within the noise level. There doesn't seem to be much downside
of using a couple cycles for fasthash's 32-bit reduction.

* revert back to master from Dec 4 and then cherry pick a86c61c9ee
(save last entry of SearchPathCache)
latency average = 545.747 ms

So chunked incremental hashing gets within ~2% of that, which is nice.
It seems we should use that when removing strlen, when convenient.

Updated next steps:
* Investigate whether/how to incorporate final length into the
calculation when we don't have the length up front.
* Add some desperately needed explanatory comments.
* Use this in some existing cases where it makes sense.
* Get back to GUC hash and dynahash.

[1] https://www.postgresql.org/message-id/CANWCAZY7Cr-GdUhrCLoR4%2BJGLChTb0pQxq9ZPi1KTLs%2B_KDFqg%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-12-12 08:23:46 Add isCatalogRel in rmgrdesc
Previous Message Dilip Kumar 2023-12-12 04:57:09 Re: Adding facility for injection points (or probe points?) for more advanced tests