From: | Brian Hurt <bhurt(at)janestcapital(dot)com> |
---|---|
To: | Kenneth Marshall <ktm(at)rice(dot)edu> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Hash index todo list item |
Date: | 2007-09-07 15:08:13 |
Message-ID: | 46E1695D.5000909@janestcapital.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Kenneth Marshall wrote:
>>>
>>>
>>>
>>How likely is it that you will get a hash collision, two strings that are
>>different that will hash to the same value? To avoid this requires a very
>>large hash key (128 bits, minimum)- otherwise you get into birthday attack
>>problems. With a 32-bit hash, the likelyhood is greater than 50% that two
>>strings in a collection of 100,000 will hash to the same value. With a
>>64-bit hash, the likelyhood is greater than 50% that two strings in a
>>collection of 10 billion will has to same value. 10 billion is a large
>>number, but not an unreasonable number, of strings to want to put into a
>>hash table- and it's exactly this case where the O(1) cost of hashtables
>>starts being a real win.
>>
>>Brian
>>
>>
>>
>Yes, there is a non-negligible chance of collision (In a DB is there
>any chance that is non-negligible? :) ) and the values must be checked
>against the actual. The win is the collapse of the index size and only
>needed to check a small fraction of the actual tuples.
>
>
>
>
Ah, OK- I misunderstood you. I thought you were saying that the hash
values would need to be unique, and you wouldn't check the original
values at all. My bad.
Brian
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2007-09-07 15:14:06 | Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC) |
Previous Message | Markus Schiltknecht | 2007-09-07 15:01:08 | terms for database replication: synchronous vs eager |