Re: Change GUC hashtable to use simplehash?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2023-11-22 21:22:21
Message-ID: 20231122212221.ahmlio2yfx7jla6z@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-22 15:56:21 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2023-11-21 16:42:55 +0700, John Naylor wrote:
> >> The strlen call required for hashbytes() is not free. The lack of
> >> mixing in the (probably inlined after 0001) previous hash function can
> >> remedied directly, as in the attached:
>
> > I doubt this is a good hashfunction. For short strings, sure, but after
> > that... I don't think it makes sense to reduce the internal state of a hash
> > function to something this small.
>
> GUC names are just about always short, though, so I'm not sure you've
> made your point?

With short I meant <= 6 characters (32 / 5 = 6.x). After that you're
overwriting bits that you previously set, without dispersing the "overwritten"
bits throughout the hash state.

It's pretty easy to create conflicts this way, even just on paper. E.g. I
think abcdefgg and cbcdefgw would have the same hash, because the accumulated
value passed to murmurhash32 is the same.

The fact that this happens when a large part of the string is the same
is bad, because it makes it more likely that prefixed strings trigger such
conflicts, and they're obviously common with GUC strings.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-22 21:27:56 Re: Change GUC hashtable to use simplehash?
Previous Message Bruce Momjian 2023-11-22 21:16:02 Re: Partial aggregates pushdown