From: | amul sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [POC] hash partitioning |
Date: | 2017-10-24 07:13:12 |
Message-ID: | CAAJ_b96n4Q_tNLqVG78pm=CG9JiDNxmUiK=BWODzoegPJkW8+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 13, 2017 at 3:00 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-10-12 17:27:52 -0400, Robert Haas wrote:
>> On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> >> In other words, it's not utterly fixed in stone --- we invented
>> >> --load-via-partition-root primarily to cope with circumstances that
>> >> could change hash values --- but we sure don't want to be changing it
>> >> with any regularity, or for a less-than-excellent reason.
>> >
>> > Yea, that's what I expected. It'd probably good for somebody to run
>> > smhasher or such on the output of the combine function (or even better,
>> > on both the 32 and 64 bit variants) in that case.
>>
>> Not sure how that test suite works exactly, but presumably the
>> characteristics in practice will depend the behavior of the hash
>> functions used as input the combine function - so the behavior could
>> be good for an (int, int) key but bad for a (text, date) key, or
>> whatever.
>
> I don't think that's true, unless you have really bad hash functions on
> the the component hashes. A hash combine function can't really do
> anything about badly hashed input, what you want is that it doesn't
> *reduce* the quality of the hash by combining.
>
I tried to get suggested SMHasher[1] test result for the hash_combine
for 32-bit and 64-bit version.
SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
N=255, using 256-N as the seed, for the hash_combine testing we
needed two hash value to be combined, for that, I've generated 64
and 128-bit hash using cityhash functions[2] for the given smhasher
key then split in two part to test 32-bit and 64-bit hash_combine
function respectively. Attached patch for SMHasher code changes &
output of 32-bit and 64-bit hash_combine testing. Note that I have
skipped speed test this test which is irrelevant here.
By referring other hash function results [3], we can see that hash_combine
test results are not bad either.
Do let me know if current testing is not good enough or if you want me to do
more testing, thanks.
1] https://github.com/aappleby/smhasher
2] https://github.com/aappleby/smhasher/blob/master/src/CityTest.cpp
3] https://github.com/rurban/smhasher/tree/master/doc
Regards,
Amul
Attachment | Content-Type | Size |
---|---|---|
smhasher_test_on_hash_combine64.txt | text/plain | 19.2 KB |
smhasher_test_on_hash_combine32.txt | text/plain | 14.0 KB |
0001-add-hash_combine-functions-in-SMHasher.patch | application/octet-stream | 3.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ivan Kartyshov | 2017-10-24 07:56:15 | Re: WIP: long transactions on hot standby feedback replica / proof of concept |
Previous Message | Simon Riggs | 2017-10-24 06:20:10 | Remove secondary checkpoint |