Quick Links

Re: Hash Functions

From:	amul sul <sulamul(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Subject:	Re: Hash Functions
Date:	2017-08-31 12:40:28
Message-ID:	CAAJ_b96jYefFEXUgVqhLbNh=fM9NfXcx6RLktr9nZC-WNvsucg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> > Thanks for the suggestion, I have updated 0002-patch accordingly.
> > Using this I found some strange behaviours as follow:
> >
> > 1) standard and extended0 output for the jsonb_hash case is not same.
> > 2) standard and extended0 output for the hash_range case is not same when
> > input
> > is int4range(550274, 1550274) other case in the patch are fine. This
> can
> > be
> > reproducible with other values as well, for e.g. int8range(1570275,
> > 208112489).
> >
> > Will look into this tomorrow.
>
> Those sound like bugs in your patch. Specifically:
>
> + /* Same approach as hash_range */
> + result = hash_uint32_extended((uint32) flags, seed);
> + result ^= lower_hash;
> + result = (result << 1) | (result >> 63);
> + result ^= upper_hash;
>
>

Yes, you are correct.

>
>
> That doesn't give compatible results. The easiest thing to do might
> be to rotate the high 32 bits and the low 32 bits separately.
> JsonbHashScalarValueExtended has the same problem. Maybe create a
> helper function that does something like this (untested):
>
> ((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
> UINT64CONST(0x100000001))
>
>

This working as expected, also tested by executing the following SQL
multiple times:

SELECT v as value, hash_range(v)::bit(32) as standard,
hash_range_extended(v, 0)::bit(32) as extended0,
hash_range_extended(v, 1)::bit(32) as extended1
FROM (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
(int8range(floor(random() * 1000)::int8, floor(random() *
10000)::int8)),
(int8range(floor(random() * 10000)::int8, floor(random() *
100000)::int8)),
(int8range(floor(random() * 10000000)::int8, floor(random() *
100000000)::int8)),
(int8range(floor(random() * 100000000)::int8, floor(random() *
1000000000)::int8))) x(v)
WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);

> > Another case which I want to discuss is, extended and standard version of
> > hashfloat4, hashfloat8 & hash_numeric function will have the same output
> for
> > zero
> > value irrespective of seed value. Do you think we need a fix for this?
>
> Yes, I think you should return the seed rather than 0 in the cases
> where the current code hard-codes a 0 return. That will give the same
> behavior in the seed == 0 case while cheaply getting us something a
> bit different when there is a seed.
>
> Fixed in the attached version.

> BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
> patch.
>
> Added i
n the attached version
.

Thanks for your all the suggestions.

Regards,
Amul

Attachment	Content-Type	Size
0001-add-optional-second-hash-proc-v4.patch	application/octet-stream	63.0 KB
0002-test-Hash_functions_v3_wip.patch	application/octet-stream	27.0 KB

In response to

Re: Hash Functions at 2017-08-30 15:35:27 from Robert Haas

Responses

Re: Hash Functions at 2017-09-01 02:31:25 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2017-08-31 12:40:45	Re: pgbench tap tests & minor fixes.
Previous Message	Tom Lane	2017-08-31 12:34:11	Re: log_destination=file