Re: Change GUC hashtable to use simplehash?

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2023-12-15 01:20:12
Message-ID: CANWCAZa6VsgeAOcPGYf4jajvahiMzDQaJ5fw8X_dOVFAADCymA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
>
> * v8 with chunked interface:
> latency average = 555.688 ms
>
> This starts to improve things for me.
>
> * v8 with chunked, and return lower 32 bits of full 64-bit hash:
> latency average = 556.324 ms
>
> This is within the noise level. There doesn't seem to be much downside
> of using a couple cycles for fasthash's 32-bit reduction.
>
> * revert back to master from Dec 4 and then cherry pick a86c61c9ee
> (save last entry of SearchPathCache)
> latency average = 545.747 ms
>
> So chunked incremental hashing gets within ~2% of that, which is nice.
> It seems we should use that when removing strlen, when convenient.
>
> Updated next steps:
> * Investigate whether/how to incorporate final length into the
> calculation when we don't have the length up front.
> * Add some desperately needed explanatory comments.
> * Use this in some existing cases where it makes sense.
> * Get back to GUC hash and dynahash.

For #1 here, I cloned SMHasher and was dismayed at the complete lack
of documentation, but after some poking around, found how to run the
tests, using the 32-bit hash to save time. It turns out that the input
length is important. I've attached two files of results -- "nolen"
means stop using the initial length to tweak the internal seed. As you
can see, there are 8 failures. "pluslen" means I then incorporated the
length within the finalizer. This *does* pass SMHasher, so that's
good. (of course this way can't produce the same hash as when we know
the length up front, but that's not important). The attached shows how
that would work, further whacking around and testing with Jeff's
prototype for the search path cache hash table. I'll work on code
comments and get it polished.

Attachment Content-Type Size
fasthash32_nolen.txt text/plain 24.2 KB
fasthash32_pluslen.txt text/plain 24.9 KB
v9-0004-Assert-that-incremental-fasthash-variants-give-th.patch application/x-patch 2.8 KB
v9-0006-Add-optional-tweak-to-finalizer.patch application/x-patch 3.6 KB
v9-0002-Rewrite-fasthash-functions-using-a-homegrown-incr.patch application/x-patch 5.8 KB
v9-0003-Fix-alignment-issue-in-the-original-fastash.patch application/x-patch 815 bytes
v9-0005-Remove-ULL.patch application/x-patch 1.9 KB
v9-0001-Vendor-fasthash.patch application/x-patch 3.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-12-15 01:30:18 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Sutou Kouhei 2023-12-15 00:53:05 Re: Make COPY format extendable: Extract COPY TO format implementations