From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Change GUC hashtable to use simplehash? |
Date: | 2023-12-04 18:57:27 |
Message-ID: | d1b0cd83a75d397616cc4c2fd3106a0ec1a777f3.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 2023-12-04 at 12:12 +0700, John Naylor wrote:
> That's a good thing to clear up. This thread has taken simplehash as
> a
> starting point from the very beginning. It initially showed no
> improvement, and then we identified problems with the hashing and
> equality computations. The latter seem like independently commitable
> improvements, so I'm curious if they help on their own, even if we
> still need to switch to simplehash as a last step to meet your
> performance goals.
There's already a patch to use simplehash, and the API is a bit
cleaner, and there's a minor performance improvement. It seems fairly
non-controversial -- should I just proceed with that patch?
> > If I understood what Andres was saying, the exposed hash state
> > would be
> > useful for writing a hash function like guc_name_hash().
>
> From my point of view, it would at least be useful for C-strings,
> where we don't have the length available up front.
That's good news.
By the way, is there any reason that we would need hash_bytes(s,
strlen(s)) == cstring_hash(s)?
> > Also, while the |= 0x20 is a nice trick for lowercasing, did we
> > decide
> > that it's better than my approach in patch 0004 here:
> >
> > https://www.postgresql.org/message-id/27a7a289d5b8f42e1b1e79b1bcaeef3a40583bd2.camel@j-davis.com
> >
> > which optimizes exact hits (most GUC names are already folded)
> > before
> > trying case folding?
>
> Note there were two aspects there: hashing and equality. I
> demonstrated in
>
> https://www.postgresql.org/message-id/CANWCAZbQ30O9j-bEZ_1zVCyKPpSjwbE4u19cSDDBJ%3DTYrHvPig%40mail.gmail.com
>
> ... in v4-0003 that the equality function can be optimized for
> already-folded names (and in fact measured almost equally) using way,
> way, way less code.
Thinking in terms of API layers, there are two approaches: (a) make the
hash and equality functions aware of the case-insensitivity, as we
currently do; or (b) make it the caller's responsibility to do case
folding, and the hash and equality functions are based on exact
equality.
Each approach has its own optimization techniques. In (a), we can use
the |= 0x20 trick, and for equality do a memcmp() check first. In (b),
the caller can first try lookup of the key in whatever form is
provided, and only if that fails, case-fold it and try again.
As a tangential point, we may eventually want to provide a more
internationalized definition of "case insensitive" for GUC names. That
would be slightly easier with (b) than with (a), but we can cross that
bridge if and when we come to it.
It seems you are moving toward (a) whereas my patches moved toward (b).
I am fine with either approach but I wanted to clarify which approach
we are using.
In the abstract, I kind of like approach (b) because we don't need to
be as special/clever with the hash functions. We would still want the
faster hash for C-strings, but that's general and helps all callers.
But you're right that it's more code, and that's not great.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2023-12-04 19:21:12 | Re: Extensible storage manager API - SMGR hook Redux |
Previous Message | Davin Shearer | 2023-12-04 18:37:06 | Re: Emitting JSON to file using COPY TO |