Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc: Robert Klemme <shortcutter(at)googlemail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date: 2011-09-19 16:14:38
Message-ID: CAGTBQpbHjm9PJQ7Xh-2QUouZBVDeAGZCYqcGSVewTTSma9oZbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Sep 19, 2011 at 12:54 PM, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com> wrote:
> 19.09.11 18:19, Robert Klemme написав(ла):
>>
>> I still haven't seen a solution to locking when a hash table needs
>> resizing.  All hashing algorithms I can think of at the moment would
>> require a lock on the whole beast during the resize which makes this
>> type of index impractical for certain loads (heavy updating).
>
> Sorry for the second reply, I should have not start writing until I've read
> all your post. Anyway.
> Do you need read lock? I'd say readers could use "old" copy of hash table up
> until the moment new bigger copy is ready. This will simply look like the
> update is not started yet, which AFAIK is OK for MVCC.
> Yep, all the writers will wait.

All this would get solved if there's no automatic hash index resizing.

DBAs would have to recreate (possibly concurrently) the hash to make it bigger.

Still, hash has lots of issues. I'm not sure how the hash is
implemented in PG, but usually, for low collision rates pseudorandom
walks are used to traverse collision chains. But pseudorandom
collision chains mean random I/O which is awful for a DB. Those
techniques have not been designed to work with secondary memory.

So, they would have to be adapted to working with secondary memory,
and that means a lot of R&D. It's not impossible, it's just a lot of
work.

I subscribe to the idea that, *in the meanwhile*, without scrapping
the hash index and in parallel to improving it, an option for
transparently-hashed btrees would be valuable.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2011-09-19 16:15:45 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Previous Message Vitalii Tymchyshyn 2011-09-19 15:54:24 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?