Re: Hash Functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-08-16 21:58:41
Message-ID: 29806.1502920721@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Attached is a quick sketch of how this could perhaps be done (ignoring
> for the moment the relatively-boring opclass pushups). It introduces
> a new function hash_any_extended which differs from hash_any() in that
> (a) it combines both b and c into the result and (b) it accepts a seed
> which is mixed into the initial state if it's non-zero.

> Comments?

Hm. Despite the comment at lines 302-304, I'm not sure that we ought
to do this simply by using "b" as the high order bits. AFAICS that
exposes little or no additional randomness; in particular it seems
unlikely to meet Jenkins' original design goal that "every 1-bit and
2-bit delta achieves avalanche". There might be some simple way to
extend the existing code to produce a mostly-independent set of 32 more
bits, but I wonder if we wouldn't be better advised to just keep Jenkins'
code as-is and use some other method entirely for producing the
32 new result bits.

... In fact, on perusing the linked-to page
http://burtleburtle.net/bob/hash/doobs.html
Bob says specifically that taking b and c from this hash does not
produce a fully random 64-bit result. He has a new one that does,
lookup3.c, but probably he hasn't tried to make that bit-compatible
with the 1997 algorithm.

Your injection of the seed as prepended data seems unassailable from the
randomness standpoint. But I wonder whether we could do it more cheaply
by xoring the seed into the initial a/b/c values --- it's not very clear
whether those are magic in any interesting way, or just some randomly
chosen constants.

Anyway, I'd certainly suggest that whoever embarks on this for real
spend some time perusing Jenkins' website.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-16 22:09:27 Re: Atomics for heap_parallelscan_nextpage()
Previous Message Andres Freund 2017-08-16 21:55:12 Re: Function to move the position of a replication slot