Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-04-28 22:52:33
Message-ID: 20200428225233.t3xr4jhwiuxolmeh@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 28, 2020 at 06:22:20PM -0400, James Coleman wrote:
>I cc'd Andres given his commit introduced simplehash, so I figured
>he'd probably have a few pointers on when each one might be useful.
>
>On Tue, Apr 28, 2020 at 8:39 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
>...
>> > Any particular reasons to pick dynahash over simplehash? ISTM we're
>> > using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
>> > while there are not many places using dynahash for simple short-lived
>> > hash tables. Of course, that alone is a weak reason to insist on using
>> > simplehash here, but I suppose there were reasons for not using dynahash
>> > and we'll end up facing the same issues here.
>>
>> No particular reason; it wasn't clear to me that there was a reason to
>> prefer one or the other (and I'm not acquainted with the codebase
>> enough to know the differences), so I chose dynahash because it was
>> easier to find examples to follow for implementation.
>
>Do you have any thoughts on what the trade-offs/use-cases etc. are for
>dynahash versus simple hash?
>
>From reading the commit message in b30d3ea824c it seems like simple
>hash is faster and optimized for CPU cache benefits. The comments at
>the top of simplehash.h also discourage it's use in non
>performance/space sensitive uses, but there isn't anything I can see
>that explicitly tries to discuss when dynahash is useful, etc.
>
>Given the performance notes in that commit message, I thinking
>switching to simple hash is worth it.
>

I recall doing some benchmarks for that patch, but it's so long I don't
really remember the details. But in general, I agree simplehash is a bit
more efficient in terms of CPU / caching.

I think the changes required to switch from dynahash to simplehash are
fairly small, so I think the best thing we can do is just try do some
measurement and then decide.

>But I also wonder if there might be some value in a README or comments
>addition that would be a guide to what the various hash
>implementations are useful for. If there's interest, I could try to
>type something short up so that we have something to make the code
>base a bit more discoverable.
>

I wouldn't object to that. Although maybe we should simply add some
basic recommendations to the comments in dynahash/simplehash.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-28 23:05:31 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Previous Message James Coleman 2020-04-28 22:22:20 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays