From: | Vitalii Tymchyshyn <tivv00(at)gmail(dot)com> |
---|---|
To: | Robert Klemme <shortcutter(at)googlemail(dot)com> |
Cc: | Merlin Moncure <mmoncure(at)gmail(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Hash index use presently(?) discouraged since 2005: revive or bury it? |
Date: | 2011-09-19 15:28:57 |
Message-ID: | 4E775FB9.6050704@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
19.09.11 18:19, Robert Klemme написав(ла):
> On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
>>
>> Postgres's hash index implementation used to be pretty horrible -- it
>> stored the pre-hashed datum in the index which, while making it easier
>> to do certain things, made it horribly slow, and, for all intents and
>> purposes, useless. Somewhat recently,a lot of work was put in to fix
>> that -- the index now packs the hash code only which made it
>> competitive with btree and superior for larger keys. However, certain
>> technical limitations like lack of WAL logging and uniqueness hold
>> hash indexing back from being used like it really should be. In cases
>> where I really *do* need hash indexing, I do it in userland.
>>
>> create table foo
>> (
>> a_long_field text;
>> );
>> create index on foo(hash(a_long_field));
>>
>> select * from foo where hash(a_long_field) = hash(some_value) and
>> a_long_field = some_value;
>>
>> This technique works fine -- the main disadvantage is that enforcing
>> uniqueness is a PITA but since the standard index doesn't support it
>> either it's no great loss. I also have the option of getting
>> 'uniqueness' and being able to skip the equality operation if I
>> sacrifice some performance and choose a strong digest. Until the hash
>> index issues are worked out, I submit that this remains the go-to
>> method to do this.
> Is this approach (storing the hash code in a btree) really faster than
> a regular btree index on "a_long_field"? And if so, for which kind of
> data and load?
Actually sometimes the field in [potentially] so long, you can't use
regular b-tree because it won't fit in the page. Say, it is "text" type.
If you will create regular index, you will actually limit column value
size to few KB. I am using md5(text) indexes in this case coupled with
rather ugly queries (see above). Native support would be nice.
Best regards, Vitalii Tymchyshyn.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-09-19 15:35:38 | Re: Hash index use presently(?) discouraged since 2005: revive or bury it? |
Previous Message | Robert Klemme | 2011-09-19 15:19:14 | Re: Hash index use presently(?) discouraged since 2005: revive or bury it? |