Quick Links

How to make hash indexes fast

From:	Craig James <craig_james(at)emolecules(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	How to make hash indexes fast
Date:	2011-09-19 01:14:01
Message-ID:	4E769759.7060101@emolecules.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Regarding the recent discussion about hash versus B-trees: Here is a trick I invented years ago to make hash indexes REALLY fast. It eliminates the need for a second disk access to check the data in almost all cases, at the cost of an additional 32-bit integer in the hash-table data structure.

Using this technique, we were able to load a hash-indexed database with data transfer rates that matched a cp (copy) command of the same data on Solaris, HP-UX and IBM AIX systems.

You build a normal hash table with hash-collision chains. But you add another 32-bit integer "signature" field to the hash-collision record (call it "DBsig"). You also create a function:

signature = sig(key)

that produces digital signature. The critical factor in the sig() function is that there is an average of 9 bits set (i.e. it is somewhat "sparse" on bits).

DBsig for a hash-collision chain is always the bitwise OR of every record in that hash-collision chain. When you add a record to the hash table, you do a bitwise OR of its signature into the existing DBsig. If you delete a record, you erase DBsig and rebuild it by recomputing the signatures of each record in the hash-collision chain and ORing them together again.

That means that for any key K, if K is actually on the disk, then all of the bits of sig(K) are always set in the hash-table record's "DBsig". If any one bit in sig(K) isn't set in "DBsig", then K is not in the database and you don't have to do a disk access to verify it. More formally, if

sig(K) AND DBsig != sig(K)

then K is definitely not in the database.

A typical hash table implementation might operate with a hash table that's 50-90% full, which means that the majority of accesses to a hash index will return a record and require a disk access to check whether the key K is actually in the database. With the signature method, you can eliminate over 99.9% of these disk accesses -- you only have to access the data when you actually want to read or update it. The hash table can usually fit easily in memory even for large tables, so it is blazingly fast.

Furthermore, performance degrades gracefully as the hash table becomes overloaded. Since each signature has 9 bits set, you can typically have 5-10 hash collisions (a lot of signatures ORed together in each record's DBsig) before the false-positive rate of the signature test gets too high. As the hash table gets overloaded and needs to be resized, the false positives increase gradually and performance decreases due to the resulting unnecessary disk fetches to check the key. In the worst case (e.g. a hash table that's overloaded by a factor of 10 or more), performance degrades to what it would be without the signatures.

For much higher selectivity, you could use a 64-bit signatures and make the sig() set an average of 13 bits. You'd get very good selectivity even in a badly overloaded hash table (long hash-collision chains).

This technique was never patented, and it was disclosed at several user-group meetings back in the early 1990's, so there are no restrictions on its use. If this is of any use to anyone, maybe you could stick my name in the code somewhere.

Craig James (the other Craig)

In response to

Re: Hash index use presently(?) discouraged since 2005: revive or bury it? at 2011-09-18 19:31:55 from Stefan Keller

Responses

Re: How to make hash indexes fast at 2011-09-19 01:25:16 from Ondrej Ivanič
Re: How to make hash indexes fast at 2011-09-19 02:58:02 from Jeff Janes

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig James	2011-09-19 01:15:45	Re: Index containing records instead of pointers to the data?
Previous Message	Thom Brown	2011-09-18 20:21:52	Re: Index containing records instead of pointers to the data?