From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: [PATCHES] updated hash functions for postgresql v1 |
Date: | 2009-01-10 22:14:10 |
Message-ID: | 20090110221410.GA7131@it.is.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 10, 2009 at 10:56:15AM -0800, Jeff Davis wrote:
> On Sat, 2009-01-10 at 13:36 -0500, Tom Lane wrote:
> > Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > > I ran 5 times on both old and new code, eliminating the top and bottom
> > > and taking the average of the remaining 3, and I got a 6.9% performance
> > > improvement with the new code.
> >
> > The question that has been carefully evaded throughout the discussion
> > of this patch is whether the randomness of the hash result is decreased,
> > and if so what is that likely to cost us in performance of the various
> > hash-dependent algorithms. I would still like to have an answer to that
> > before we make a change to gain marginal performance improvement in
> > the hash function itself (which is already shown to be barely measurable
> > in the total context of a hash-dependent operation...)
> >
>
> In:
> http://archives.postgresql.org/message-id/20081104202655.GP18362@it.is.rice.edu
>
> Ken showed that the number of hash collisions is the same in the old and
> the new code for his test data. Not a direct measurement of randomness,
> but it's some kind of indication.
>
> I'm not an authority on either hash functions or statistics, so I'll
> have to defer to Ken or someone else to actually prove that the
> randomness is maintained.
>
> Regards,
> Jeff Davis
>
First, while I am not an expert by any means, basic statistics will give you the
probability of a collision when packing N items into M buckets chosen at random.
In all of my tests, I used 1.6M items into 2^32 buckets. If the bits mixing is
complete and random, the number of collisions should be close to the same no
matter what arrangement of bits are used for inputs. I think my test cases
indicate quite clearly that the new hash function is as independent of bit-
order as the older functions, in that the number of bucket collisions is
basically the same for all layouts. I have the test harness and can test
any other input that you would like me to. Second, the author of the basic
hash function (old and new) has performed more extensive testing and has
shown that the new functions are better in randomizing bits than his original
function. I will try and run a micro-benchmark of my original patch in March
and the result of the incremental approach that is the result of my Novermber
patch tomorrow.
Cheers,
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Kenneth Marshall | 2009-01-10 22:20:38 | Re: [PATCHES] updated hash functions for postgresql v1 |
Previous Message | Pavel Stehule | 2009-01-10 21:17:46 | Re: [HACKERS] BUG #4516: FOUND variable does not work after RETURN QUERY |