Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: Robert Klemme <shortcutter(at)googlemail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date: 2011-09-19 15:54:24
Message-ID: 4E7765B0.4000900@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

19.09.11 18:19, Robert Klemme написав(ла):
>
> I still haven't seen a solution to locking when a hash table needs
> resizing. All hashing algorithms I can think of at the moment would
> require a lock on the whole beast during the resize which makes this
> type of index impractical for certain loads (heavy updating).
Sorry for the second reply, I should have not start writing until I've
read all your post. Anyway.
Do you need read lock? I'd say readers could use "old" copy of hash
table up until the moment new bigger copy is ready. This will simply
look like the update is not started yet, which AFAIK is OK for MVCC.
Yep, all the writers will wait.

Another option could be to start background build of larger hash - for
some time your performance will be degraded since you are writing to two
indexes instead of one plus second one is rebuilding, but I'd say low
latency solution is possible here.

One more: I don't see actually why can't you have a "rolling" expand of
hash table. I will try to describe it correct me if I am wrong:
1) The algorithm I am talking about will take "n" bits from hash code to
for hash table. So, during expansion it will double number of baskets.
2) Say, we are going from 2^n = n1 to 2^(n+1) = n2 = n1 * 2 baskets.
Each new pair of baskets will take data from single source basket
depending on the value of new hash bit used. E.g. if n were 2, we've had
4 baskets and new table will have 8 baskets. Everything from old basket
#1 will go into new baskets #2 and #3 depending on hash value.
3) So, we can have a counter on number of baskets processed. Any
operation on any lower numbered basket will go to "new set". Any
operation on any higher numbered basket will go to "old set". Any
operation on currently converting basket will block until conversion is
done.

P.S. Sorry for a lot of possibly dumb thoughts, I don't know why I've
got such a though stream on this topic :)

Best regards, Vitalii Tymchyshyn.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2011-09-19 16:14:38 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Previous Message Tom Lane 2011-09-19 15:35:38 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?