Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-04-29 15:50:19
Message-ID: 20200429155019.pq2rpepvfd5r56cp@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 29, 2020 at 11:34:24AM -0400, James Coleman wrote:
>On Wed, Apr 29, 2020 at 11:17 AM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
>> >On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
>> >> > I cc'd Andres given his commit introduced simplehash, so I figured
>> >> > he'd probably have a few pointers on when each one might be useful.
>> >> > [...]
>> >> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
>> >> > dynahash versus simple hash?
>> >> >
>> >> > From reading the commit message in b30d3ea824c it seems like simple
>> >> > hash is faster and optimized for CPU cache benefits. The comments at
>> >> > the top of simplehash.h also discourage it's use in non
>> >> > performance/space sensitive uses, but there isn't anything I can see
>> >> > that explicitly tries to discuss when dynahash is useful, etc.
>> >>
>> >> Benefits of dynahash (chained hashtable):
>> >> - supports partitioning, useful for shared memory accessed under locks
>> >> - better performance for large entries, as they don't need to be moved
>> >> around in case of hash conflicts
>> >> - stable pointers to hash table entries
>> >>
>> >> Benefits of simplehash (open addressing hash table):
>> >> - no indirect function calls, known structure sizes, due to "templated"
>> >> code generation (these show up substantially in profiles for dynahash)
>> >> - considerably faster for small entries due to previous point, and due
>> >> open addressing hash tables having better cache behaviour than chained
>> >> hashtables
>> >> - once set-up the interface is type safe and easier to use
>> >> - no overhead of a separate memory context etc
>> >>
>> >>
>> >> > Given the performance notes in that commit message, I thinking
>> >> > switching to simple hash is worth it.
>> >>
>> >> Seems plausible to me.
>> >>
>> >>
>> >> > But I also wonder if there might be some value in a README or comments
>> >> > addition that would be a guide to what the various hash
>> >> > implementations are useful for. If there's interest, I could try to
>> >> > type something short up so that we have something to make the code
>> >> > base a bit more discoverable.
>> >>
>> >> That'd make sense to me.
>> >
>> >Cool, I'll work on that as I have time then.
>> >
>> >One question: what is the reasoning behind having SH_STORE_HASH? The
>> >only things I could imagine would be a case where you have external
>> >pointers to some set of values or need to be able to use the hash for
>> >other reasons besides the hash table (and so can avoid calculating it
>> >twice), but maybe I'm missing something.
>> >
>>
>> I believe it's because computing the hash may be fairly expensive for
>> some data types, in which case it may be better to just store it for
>> future use.
>
>But is it storing it for use primarily by the hash table
>implementation (i.e., does it need the hash stored this way to avoid
>repeated recalculation) or for caller's use?
>

For the hash table implementation, I think. Simplehash is using "robin
hood" hashing, which needs to decide which entry to move in case of
collision. AFAIC that requires the knowledge of hash value for both the
new and existing entry, and having to compute that over and over would
be fairly expensive. But that's my understanding, I might be wrong and
it's useful for external use too.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vinicius Abrahao 2020-04-29 15:56:11 SEQUENCE values (duplicated) in some corner cases when crash happens
Previous Message Erik Rijkers 2020-04-29 15:35:22 Re: Additional Chapter for Tutorial - (review first half of 0003)