From: | Andrew - Supernews <andrew+nonews(at)supernews(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: hashtext () and collisions |
Date: | 2007-04-12 03:54:59 |
Message-ID: | slrnf1rbci.2jjj.andrew+nonews@atlantis.supernews.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 2007-04-11, "Leon Mergen" <leon(at)solatis(dot)com> wrote:
> Now, my question is: how big is the chance that a collision happens
> between hashes ? I noticed that the function only returns a 32 bit
> number, so I figure it must be at least once in the 4 billion values.
Assuming it's a uniform random hash, 32 bits long, then if you have
65536 values, you have a ~40% chance of at least one collision. Any
defects in the hash function only increase that probability.
This is a result of what's known as the "birthday paradox" (so-called
because in a group of 23 people, there is a better than even chance that
two of them share a birthday). The number of rows needed to have an
approximately even chance of at least one collision grows as the
_square root_ of the number of hash buckets; or to put it another way,
you always need _more than twice as many bits_ in your hash value than
you think you do. (e.g. using md5(), which is a 128-bit hash)
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
From | Date | Subject | |
---|---|---|---|
Next Message | Patrick TJ McPhee | 2007-04-12 04:40:15 | Re: The rule question before, request official documentation on the problem |
Previous Message | Guy Rouillier | 2007-04-12 03:35:59 | Re: Evaluate only one CASE WHEN in a select |