Quick Links

Re: Change GUC hashtable to use simplehash?

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Change GUC hashtable to use simplehash?
Date:	2023-11-17 23:27:12
Message-ID:	20231117232712.mpnzsho7ydssg2oc@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2023-11-17 14:08:56 -0800, Jeff Davis wrote:
> On Fri, 2023-11-17 at 17:04 -0500, Tom Lane wrote:
> > I can't imagine wanting to convert *every* hashtable in the system
> > to simplehash; the added code bloat would be unreasonable. So yeah,
> > I think we'll have two mechanisms indefinitely. That's not to say
> > that we might not rewrite hsearch. But simplehash was never meant
> > to be a universal solution.
>
> OK, I will withdraw the patch until/unless it provides a concrete
> benefit.

It might already in the space domain:

SELECT count(*), sum(total_bytes) total_bytes, sum(total_nblocks) total_nblocks, sum(free_bytes) free_bytes, sum(free_chunks) free_chunks, sum(used_bytes) used_bytes
FROM pg_backend_memory_contexts
WHERE name LIKE 'GUC%';

HEAD:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│ 2 │ 57344 │ 5 │ 25032 │ 10 │ 32312 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘

your patch:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│ 1 │ 36928 │ 3 │ 12360 │ 3 │ 24568 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘

However, it fares less well at larger number of GUCs, performance wise. At
first I thought that that's largely because you aren't using SH_STORE_HASH.
With that, it's slower when creating a large number of GUCs, but a good bit
faster retrieving them. But that slowness didn't seem right.

Then I noticed that memory usage was too large when creating many GUCs - a bit
of debugging later, I figured out that that's due to guc_name_hash() being
terrifyingly bad. There's no bit mixing whatsoever! Which leads to very large
numbers of hash conflicts - which simplehash tries to defend against a bit by
making the table larger.

(gdb) p guc_name_hash("andres.c2")
$14 = 3798554171
(gdb) p guc_name_hash("andres.c3")
$15 = 3798554170

Fixing that makes simplehash always faster, but still doesn't win on memory
usage at the upper end - the two pointers in GUCHashEntry make it too big.

I think, independent of this patch, it might be worth requiring that hash
table lookups applied the transformation before the lookup. A comparison
function this expensive is not great...

Greetings,

Andres Freund

In response to

Re: Change GUC hashtable to use simplehash? at 2023-11-17 22:08:56 from Jeff Davis

Responses

Re: Change GUC hashtable to use simplehash? at 2023-11-18 00:01:31 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2023-11-18 00:00:00	Re: Add pg_basetype() function to obtain a DOMAIN base type
Previous Message	Melanie Plageman	2023-11-17 23:12:08	Re: Emit fewer vacuum records by reaping removable tuples during pruning