Re: Change GUC hashtable to use simplehash?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2023-11-17 23:27:12
Message-ID: 20231117232712.mpnzsho7ydssg2oc@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-17 14:08:56 -0800, Jeff Davis wrote:
> On Fri, 2023-11-17 at 17:04 -0500, Tom Lane wrote:
> > I can't imagine wanting to convert *every* hashtable in the system
> > to simplehash; the added code bloat would be unreasonable.  So yeah,
> > I think we'll have two mechanisms indefinitely.  That's not to say
> > that we might not rewrite hsearch.  But simplehash was never meant
> > to be a universal solution.
>
> OK, I will withdraw the patch until/unless it provides a concrete
> benefit.

It might already in the space domain:

SELECT count(*), sum(total_bytes) total_bytes, sum(total_nblocks) total_nblocks, sum(free_bytes) free_bytes, sum(free_chunks) free_chunks, sum(used_bytes) used_bytes
FROM pg_backend_memory_contexts
WHERE name LIKE 'GUC%';

HEAD:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│ 2 │ 57344 │ 5 │ 25032 │ 10 │ 32312 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘

your patch:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│ 1 │ 36928 │ 3 │ 12360 │ 3 │ 24568 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘

However, it fares less well at larger number of GUCs, performance wise. At
first I thought that that's largely because you aren't using SH_STORE_HASH.
With that, it's slower when creating a large number of GUCs, but a good bit
faster retrieving them. But that slowness didn't seem right.

Then I noticed that memory usage was too large when creating many GUCs - a bit
of debugging later, I figured out that that's due to guc_name_hash() being
terrifyingly bad. There's no bit mixing whatsoever! Which leads to very large
numbers of hash conflicts - which simplehash tries to defend against a bit by
making the table larger.

(gdb) p guc_name_hash("andres.c2")
$14 = 3798554171
(gdb) p guc_name_hash("andres.c3")
$15 = 3798554170

Fixing that makes simplehash always faster, but still doesn't win on memory
usage at the upper end - the two pointers in GUCHashEntry make it too big.

I think, independent of this patch, it might be worth requiring that hash
table lookups applied the transformation before the lookup. A comparison
function this expensive is not great...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-11-18 00:00:00 Re: Add pg_basetype() function to obtain a DOMAIN base type
Previous Message Melanie Plageman 2023-11-17 23:12:08 Re: Emit fewer vacuum records by reaping removable tuples during pruning