Re: Hash Functions

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-08-16 22:27:38
Message-ID: 20170816222738.GP13072@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Attached is a quick sketch of how this could perhaps be done (ignoring
> > for the moment the relatively-boring opclass pushups). It introduces
> > a new function hash_any_extended which differs from hash_any() in that
> > (a) it combines both b and c into the result and (b) it accepts a seed
> > which is mixed into the initial state if it's non-zero.
>
> > Comments?
>
> Hm. Despite the comment at lines 302-304, I'm not sure that we ought
> to do this simply by using "b" as the high order bits. AFAICS that
> exposes little or no additional randomness; in particular it seems
> unlikely to meet Jenkins' original design goal that "every 1-bit and
> 2-bit delta achieves avalanche". There might be some simple way to
> extend the existing code to produce a mostly-independent set of 32 more
> bits, but I wonder if we wouldn't be better advised to just keep Jenkins'
> code as-is and use some other method entirely for producing the
> 32 new result bits.
>
> ... In fact, on perusing the linked-to page
> http://burtleburtle.net/bob/hash/doobs.html
> Bob says specifically that taking b and c from this hash does not
> produce a fully random 64-bit result. He has a new one that does,
> lookup3.c, but probably he hasn't tried to make that bit-compatible
> with the 1997 algorithm.
>

Hi,

The updated hash functions that we currently use are based on Bob Jenkins
lookup3.c and using b as the higher order bits is pretty darn good. I had
lobbied to present the 64-bit b+c hash in the original work for similar
reasons. We are definitely not using a lookup2.c version from 1997.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-16 22:31:04 Re: Atomics for heap_parallelscan_nextpage()
Previous Message Andres Freund 2017-08-16 22:25:03 Re: Atomics for heap_parallelscan_nextpage()