From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | James Coleman <jtc331(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date: | 2020-04-29 15:17:35 |
Message-ID: | 20200429151735.wa45dxiwmuz3il6k@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
>On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> Hi,
>>
>> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
>> > I cc'd Andres given his commit introduced simplehash, so I figured
>> > he'd probably have a few pointers on when each one might be useful.
>> > [...]
>> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
>> > dynahash versus simple hash?
>> >
>> > From reading the commit message in b30d3ea824c it seems like simple
>> > hash is faster and optimized for CPU cache benefits. The comments at
>> > the top of simplehash.h also discourage it's use in non
>> > performance/space sensitive uses, but there isn't anything I can see
>> > that explicitly tries to discuss when dynahash is useful, etc.
>>
>> Benefits of dynahash (chained hashtable):
>> - supports partitioning, useful for shared memory accessed under locks
>> - better performance for large entries, as they don't need to be moved
>> around in case of hash conflicts
>> - stable pointers to hash table entries
>>
>> Benefits of simplehash (open addressing hash table):
>> - no indirect function calls, known structure sizes, due to "templated"
>> code generation (these show up substantially in profiles for dynahash)
>> - considerably faster for small entries due to previous point, and due
>> open addressing hash tables having better cache behaviour than chained
>> hashtables
>> - once set-up the interface is type safe and easier to use
>> - no overhead of a separate memory context etc
>>
>>
>> > Given the performance notes in that commit message, I thinking
>> > switching to simple hash is worth it.
>>
>> Seems plausible to me.
>>
>>
>> > But I also wonder if there might be some value in a README or comments
>> > addition that would be a guide to what the various hash
>> > implementations are useful for. If there's interest, I could try to
>> > type something short up so that we have something to make the code
>> > base a bit more discoverable.
>>
>> That'd make sense to me.
>
>Cool, I'll work on that as I have time then.
>
>One question: what is the reasoning behind having SH_STORE_HASH? The
>only things I could imagine would be a case where you have external
>pointers to some set of values or need to be able to use the hash for
>other reasons besides the hash table (and so can avoid calculating it
>twice), but maybe I'm missing something.
>
I believe it's because computing the hash may be fairly expensive for
some data types, in which case it may be better to just store it for
future use.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Antonin Houska | 2020-04-29 15:28:33 | Re: Accidental use of the PVC_RECURSE_WINDOWFUNCS flag? |
Previous Message | Antonin Houska | 2020-04-29 14:45:20 | Accidental use of the PVC_RECURSE_WINDOWFUNCS flag? |