Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: Robert Klemme <shortcutter(at)googlemail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date: 2011-09-19 15:28:57
Message-ID: 4E775FB9.6050704@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

19.09.11 18:19, Robert Klemme написав(ла):
> On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
>>
>> Postgres's hash index implementation used to be pretty horrible -- it
>> stored the pre-hashed datum in the index which, while making it easier
>> to do certain things, made it horribly slow, and, for all intents and
>> purposes, useless. Somewhat recently,a lot of work was put in to fix
>> that -- the index now packs the hash code only which made it
>> competitive with btree and superior for larger keys. However, certain
>> technical limitations like lack of WAL logging and uniqueness hold
>> hash indexing back from being used like it really should be. In cases
>> where I really *do* need hash indexing, I do it in userland.
>>
>> create table foo
>> (
>> a_long_field text;
>> );
>> create index on foo(hash(a_long_field));
>>
>> select * from foo where hash(a_long_field) = hash(some_value) and
>> a_long_field = some_value;
>>
>> This technique works fine -- the main disadvantage is that enforcing
>> uniqueness is a PITA but since the standard index doesn't support it
>> either it's no great loss. I also have the option of getting
>> 'uniqueness' and being able to skip the equality operation if I
>> sacrifice some performance and choose a strong digest. Until the hash
>> index issues are worked out, I submit that this remains the go-to
>> method to do this.
> Is this approach (storing the hash code in a btree) really faster than
> a regular btree index on "a_long_field"? And if so, for which kind of
> data and load?

Actually sometimes the field in [potentially] so long, you can't use
regular b-tree because it won't fit in the page. Say, it is "text" type.
If you will create regular index, you will actually limit column value
size to few KB. I am using md5(text) indexes in this case coupled with
rather ugly queries (see above). Native support would be nice.

Best regards, Vitalii Tymchyshyn.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-09-19 15:35:38 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Previous Message Robert Klemme 2011-09-19 15:19:14 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?