Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date: 2011-09-19 02:14:39
Message-ID: CAMkU=1xYy0rH9E7JfgMJEzDuESTs=diRkuiRfh7VF8RA+Mg4PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Sep 18, 2011 at 7:59 AM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
> Merlin and Jeff,
>
> General remark again:It's hard for me to imagine that btree is
> superior for all the issues mentioned before. I still believe in hash
> index for primary keys and certain unique constraints where you need
> equality search and don't need ordering or range search.

I certainly agree that hash indexes as implemented in PG
could be improved on.

>
> 2011/9/17 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:
> (...)
>> Also, that link doesn't address concurrency of selects at all, only of inserts.
>
> How would (or did) you test and benchmark concurrency of inserts and selects?
> Use pgbench with own config for a blackbox test?

I used pgbench -S -M prepared with a scale that fits in
shared_buffers, at various concurrencies. drop the pgbench_accounts
primary key and build alternatingly a regular index and a hash index
between runs. (If the scale doesn't fit in memory, that should
advantage the hash, but I haven't seen a large one--I've never tested
a size at which the branch blocks don't fit in memory)

It is hard to see real differences here because the index is not the
main bottleneck, regardless of which index is in use (at least on only
8 CPUs, with enough CPUs you might be able to drive the hash index
over the edge)

I also used a custom pgbench option -P, (a patch adding which feature
I was supposed to submit to this commitfest, but missed). Cuts down
on a lot of the network chatter, locking, and other overhead and so
simulates an index look up occurring on the inside of a nested loop.

The performance at -c 1 was roughly equal, but at -c 8 the hash was
three times slower.

I don't recall concurrent testing inserts (not for speed, anyway).

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2011-09-19 02:37:44 Re: cannot use multicolumn index
Previous Message Claudio Freire 2011-09-19 01:27:22 Re: How to make hash indexes fast