From: | pgsql(at)mohawksoft(dot)com |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | sailesh(at)cs(dot)berkeley(dot)edu, "Dann Corbit" <dcorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Why hash indexes suck |
Date: | 2004-06-06 14:02:06 |
Message-ID: | 16596.24.91.171.78.1086530526.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
> Sailesh Krishnamurthy <sailesh(at)cs(dot)berkeley(dot)edu> writes:
>> This is probably a crazy idea, but is it possible to organize the data
>> in a page of a hash bucket as a binary tree ?
>
> Only if you want to require a hash opclass to supply ordering operators,
> which sort of defeats the purpose I think. Hash is only supposed to
> need equality not ordering.
>
A btree is frequently used within the buckets of a hash table, expecially
if you expect to have a large number of items in each bucket.
If PostgreSQL could create a hash table index which is a single top level
hash table with each hash bucket being a btree index. You can eliminate a
number of btree searches by hashing, and then fall into btree performance
after the first hash lookup. The administrator should be able to gather
statistics about the population of the hash buckets and rehash if
performance begins to behave like a btree or the data is not distributed
evenly. Given proper selection of the initial number of buckets, a hash
table could blow btree out of the water. Given a poor selection of the
number of buckets, i.e. 1, a hash will behave no worse than a btree.
Also, it would be helpful to be able to specify a hash function during the
create or rehash, given a specific class of data, extraction of the random
elements can be more efficient and/or effective given knowledge of the
data. Think something like "bar codes," there are portions of the data
which are ususally the same and portions of the data which are usually
different. Focusing on the portions of the data which tend to be different
will generally provide a more evenly distributed hash.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruno Wolff III | 2004-06-06 15:07:51 | Re: row access |
Previous Message | Jeff Davis | 2004-06-06 09:21:14 | Re: [HACKERS] Slony-I goes BETA (possible bug) |
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2004-06-06 17:32:12 | Re: [HACKERS] Slony-I goes BETA (possible bug) |
Previous Message | Andreas Pflug | 2004-06-06 09:34:01 | pgsql 7.5 frontend changes wrapup |