Quick Links

Re: Hash index todo list item

From:	Brian Hurt <bhurt(at)janestcapital(dot)com>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Hash index todo list item
Date:	2007-09-07 15:08:13
Message-ID:	46E1695D.5000909@janestcapital.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Kenneth Marshall wrote:

>>>
>>>
>>>
>>How likely is it that you will get a hash collision, two strings that are
>>different that will hash to the same value? To avoid this requires a very
>>large hash key (128 bits, minimum)- otherwise you get into birthday attack
>>problems. With a 32-bit hash, the likelyhood is greater than 50% that two
>>strings in a collection of 100,000 will hash to the same value. With a
>>64-bit hash, the likelyhood is greater than 50% that two strings in a
>>collection of 10 billion will has to same value. 10 billion is a large
>>number, but not an unreasonable number, of strings to want to put into a
>>hash table- and it's exactly this case where the O(1) cost of hashtables
>>starts being a real win.
>>
>>Brian
>>
>>
>>
>Yes, there is a non-negligible chance of collision (In a DB is there
>any chance that is non-negligible? :) ) and the values must be checked
>against the actual. The win is the collapse of the index size and only
>needed to check a small fraction of the actual tuples.
>
>
>
>

Ah, OK- I misunderstood you. I thought you were saying that the hash
values would need to be unique, and you wouldn't check the original
values at all. My bad.

Brian

In response to

Re: Hash index todo list item at 2007-09-07 14:55:07 from Kenneth Marshall

Responses

Re: Hash index todo list item at 2007-09-07 15:15:26 from Kenneth Marshall

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dave Page	2007-09-07 15:14:06	Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Previous Message	Markus Schiltknecht	2007-09-07 15:01:08	terms for database replication: synchronous vs eager