From: | amul sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
Subject: | Re: Hash Functions |
Date: | 2017-08-31 12:40:28 |
Message-ID: | CAAJ_b96jYefFEXUgVqhLbNh=fM9NfXcx6RLktr9nZC-WNvsucg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> > Thanks for the suggestion, I have updated 0002-patch accordingly.
> > Using this I found some strange behaviours as follow:
> >
> > 1) standard and extended0 output for the jsonb_hash case is not same.
> > 2) standard and extended0 output for the hash_range case is not same when
> > input
> > is int4range(550274, 1550274) other case in the patch are fine. This
> can
> > be
> > reproducible with other values as well, for e.g. int8range(1570275,
> > 208112489).
> >
> > Will look into this tomorrow.
>
> Those sound like bugs in your patch. Specifically:
>
> + /* Same approach as hash_range */
> + result = hash_uint32_extended((uint32) flags, seed);
> + result ^= lower_hash;
> + result = (result << 1) | (result >> 63);
> + result ^= upper_hash;
>
>
Yes, you are correct.
>
>
> That doesn't give compatible results. The easiest thing to do might
> be to rotate the high 32 bits and the low 32 bits separately.
> JsonbHashScalarValueExtended has the same problem. Maybe create a
> helper function that does something like this (untested):
>
> ((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
> UINT64CONST(0x100000001))
>
>
This working as expected, also tested by executing the following SQL
multiple times:
SELECT v as value, hash_range(v)::bit(32) as standard,
hash_range_extended(v, 0)::bit(32) as extended0,
hash_range_extended(v, 1)::bit(32) as extended1
FROM (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
(int8range(floor(random() * 1000)::int8, floor(random() *
10000)::int8)),
(int8range(floor(random() * 10000)::int8, floor(random() *
100000)::int8)),
(int8range(floor(random() * 10000000)::int8, floor(random() *
100000000)::int8)),
(int8range(floor(random() * 100000000)::int8, floor(random() *
1000000000)::int8))) x(v)
WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);
> > Another case which I want to discuss is, extended and standard version of
> > hashfloat4, hashfloat8 & hash_numeric function will have the same output
> for
> > zero
> > value irrespective of seed value. Do you think we need a fix for this?
>
> Yes, I think you should return the seed rather than 0 in the cases
> where the current code hard-codes a 0 return. That will give the same
> behavior in the seed == 0 case while cheaply getting us something a
> bit different when there is a seed.
>
> Fixed in the attached version.
> BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
> patch.
>
> Added i
n the attached version
.
Thanks for your all the suggestions.
Regards,
Amul
Attachment | Content-Type | Size |
---|---|---|
0001-add-optional-second-hash-proc-v4.patch | application/octet-stream | 63.0 KB |
0002-test-Hash_functions_v3_wip.patch | application/octet-stream | 27.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-08-31 12:40:45 | Re: pgbench tap tests & minor fixes. |
Previous Message | Tom Lane | 2017-08-31 12:34:11 | Re: log_destination=file |