From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Change GUC hashtable to use simplehash? |
Date: | 2023-12-06 00:39:00 |
Message-ID: | CANWCAZZYAZUH_3GYUtETmLOcd-5oD+kVC3cX889F7tzAQwtOsA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 5, 2023 at 1:57 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2023-12-04 at 12:12 +0700, John Naylor wrote:
> There's already a patch to use simplehash, and the API is a bit
> cleaner, and there's a minor performance improvement. It seems fairly
> non-controversial -- should I just proceed with that patch?
I won't object if you want to commit that piece now, but I hesitate to
call it a performance improvement on its own.
- The runtime measurements I saw reported were well within the noise level.
- The memory usage starts out better, but with more entries is worse.
> > From my point of view, it would at least be useful for C-strings,
> > where we don't have the length available up front.
>
> That's good news.
>
> By the way, is there any reason that we would need hash_bytes(s,
> strlen(s)) == cstring_hash(s)?
"git grep cstring_hash" found nothing, so not sure what you're asking.
> Each approach has its own optimization techniques. In (a), we can use
> the |= 0x20 trick, and for equality do a memcmp() check first.
I will assume you are referring to semantics, but on the odd chance
readers take this to mean the actual C library call, that wouldn't be
an optimization, that'd be a pessimization.
> As a tangential point, we may eventually want to provide a more
> internationalized definition of "case insensitive" for GUC names. That
> would be slightly easier with (b) than with (a), but we can cross that
> bridge if and when we come to it.
The risk/reward ratio seems pretty bad.
> It seems you are moving toward (a) whereas my patches moved toward (b).
> I am fine with either approach but I wanted to clarify which approach
> we are using.
I will make my case:
> In the abstract, I kind of like approach (b) because we don't need to
> be as special/clever with the hash functions.
In the abstract, I consider (b) to be a layering violation. As a
consequence, the cleverness in (b) is not confined to one or two
places, but is smeared over a whole bunch of places. I find it hard to
follow.
Concretely, it also adds another pointer to the element struct. That's
not good for a linear open-addressing array, which simplehash has.
Further, remember the equality function is important as well. In v3,
it was "strcmp(a,b)==0", which is a holdover from the dynahash API.
One of the advantages of the simplehash API is that we can 1) use an
equality function, which should be slightly cheaper than a full
comparison function, and 2) we have the option to inline it. (It
doesn't make sense in turn, to jump to a shared lib page and invoke an
indirect function call.) Once we've done that, it's already "special",
so it's not a stretch to make it do what we want to begin with. If a
nicer API is important, why not use it?
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2023-12-06 00:41:06 | Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans) |
Previous Message | Jeff Davis | 2023-12-05 23:55:11 | Re: Faster "SET search_path" |