From: | Ildar Musin <i(dot)musin(at)postgrespro(dot)ru> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: General purpose hashing func in pgbench |
Date: | 2018-03-06 13:47:43 |
Message-ID: | 75f45845-24ed-347b-66d5-2f18d39c6793@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Teodor,
Thank you for reviewing this patch.
On 06.03.2018 15:53, Teodor Sigaev wrote:
>>> Patch applies, compiles, pgbench & global "make check" ok, doc
>>> built ok.
>
> Agree.
>
> If I understand upthread correctly, implementation of Murmur hash
> algorithm based on Austin Appleby work
> https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp
>
> If so, I have notice and objections:
>
> 1) Seems, it's good idea to add credits to Austin Appleby to
> comments.
>
Sounds fair, I'll send an updated version soon.
> 2) Reference implementaion directly says (link above): // 2. It will
> not produce the same results on little-endian and big-endian //
> machines.
>
> I don't think that is good thing for testing and benchmarking for
> several reasons: it could produce different data collection,
> different selects, different distribution.
>
> 3) Again, from comments of reference implementation: // Note - This
> code makes a few assumptions about how your machine behaves - // 1.
> We can read a 4-byte value from any address without crashing
>
> It's not true for all supported platforms. Any box with strict
> aligment will SIGBUSed here.
>
I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).
Thanks!
--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Zakirov | 2018-03-06 14:00:25 | Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw |
Previous Message | Darafei Komяpa Praliaskouski | 2018-03-06 13:39:07 | Re: All Taxi Services need Index Clustered Heap Append |