Re: Hash Functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Hash Functions
Date: 2017-08-03 22:47:02
Message-ID: CA+TgmoYe2CxQi-B507JTT07vpygVHt=SzNr+ZLs4yW7+Fvmafw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 3, 2017 at 6:08 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> That's another way to go, but it requires inventing a way to thread
>> the IV through the hash opclass interface.
>
> Only if we really want to do it really well :P. Using a hash_combine()
> like
>
> /*
> * Combine two hash values, resulting in another hash value, with decent bit
> * mixing.
> *
> * Similar to boost's hash_combine().
> */
> static inline uint32
> hash_combine(uint32 a, uint32 b)
> {
> a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
> return a;
> }
>
> between hash(IV) and the hashfunction should do the trick (the IV needs
> to hashed once, otherwise the bit mix is bad).

That seems pretty lame, although it's sufficient to solve the
immediate problem, and I have to admit to a certain predilection for
things that solve the immediate problem without creating lots of
additional work.

>> We could:
>>
>> - Invent a new hash_partition AM that doesn't really make indexes but
>> supplies hash functions for hash partitioning.
>> - Add a new, optional support function 2 to the hash AM that takes a
>> value of the type *and* an IV as an argument.
>> - Something else.
>
> Not arguing for it, but one option could also have pg_type.hash*
> function(s).

True. That is a bit less configurable because you can't then have
multiple functions for the same type. Going through the opclass
interface means you can have hash_portable_ops and hash_fast_ops and
let people choose. But this would be easy to implement and enough for
most people in practice.

> One thing that I think might be advisable to think about is that we're
> atm stuck with a relatively bad hash function for hash indexes (and hash
> joins/aggs), and we should probably evolve it at some point. At the same
> time there's currently people out there relying on the current hash
> functions remaining stable.

That to me is a bit of a separate problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-03 22:50:45 analyzeCTE is too strict about typmods?
Previous Message Tom Lane 2017-08-03 22:24:55 Re: PL_stashcache, or, what's our minimum Perl version?