Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date: 2011-09-16 01:00:17
Message-ID: CAHyXU0yHQM3e6afLuQETk+LKD-TZO39mrF2uL+rk0JD6RCeotA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Sep 15, 2011 at 5:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> HM, what if you junked the current hash indexam, and just implemented
>> a wrapper over btree so that the 'hash index' was just short hand for
>> hashing the value into a standard index?
>
> Surely creating such a wrapper would be *more* work than adding WAL
> support to the hash AM.
>
> I'm not entirely following this eagerness to junk that AM, anyway.
> We've put a lot of sweat into it over the years, in the hopes that
> it would eventually be good for something.  It's on the edge of
> being good for something now, and there's doubtless room for more
> improvements, so why are the knives out?

Just making an observation. Some quick tests follow the sig. I think
the point here is that something has to be done -- now that the
replication train has left the station, not having WAL has gone from
quirky annoyance to major functionality failure. The recent hash work
has brought down index build times to a reasonable level, but they are
still getting beat by btree. Of course, it's not quite apples to
apples (I figure the timings will even out to an extent once you add
in the hashing wrapper), but I can't help but wonder if the btree code
is a better driver and consolidating code is a good thing.

merlin

postgres=# create table v as select generate_series(1,10000000) as x;
SELECT 10000000
Time: 16750.961 ms
postgres=# create index on v(x);
CREATE INDEX
Time: 15158.637 ms
postgres=# create index on v using hash(x);
CREATE INDEX
Time: 22505.468 ms

postgres=# \d v
Table "public.v"
Column | Type | Modifiers
--------+---------+-----------
x | integer |
Indexes:
"v_x_idx" btree (x)
"v_x_idx1" hash (x)

postgres=# select relname, relfilenode from pg_class where relname like 'v_x%';
relname | relfilenode
----------+-------------
v_x_idx | 16525
v_x_idx1 | 16526
(2 rows)

c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16525
09/15/2011 07:46 PM 224,641,024 16525

c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16526
09/15/2011 07:49 PM 268,451,840 16526

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2011-09-16 01:04:58 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Previous Message Claudio Freire 2011-09-16 00:34:04 Re: Hash index use presently(?) discouraged since 2005: revive or bury it?