creating hash indexes

From: Rick Otten <rottenwindfish(at)gmail(dot)com>
To: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: creating hash indexes
Date: 2022-12-14 20:03:42
Message-ID: CAMAYy4KDKVP=Fu+c+Luc9U4huXQ=Ag+jnEUMcaYUdQXo9=UpJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I inherited a database with several single-digit billion row tables. Those
tables have a varchar(36) column populated with uuids (all connected to
each other via FKs) each currently supported by a btree index.

After the recent conversations about hash indexes I thought I'd do some
comparisons to see if using a hash index could help and perhaps
depriortize my burning desire to change the data type. We never look up
uuids with inequalities after all. Indeed, in my test environments the
hash index was half the size of the btree index, and the select performance
was slightly faster than btree lookups. varchar(36) with hash index was
roughly comparable to using a uuid data type (btree or hash index).

I was pretty excited until I tried to create the index on a table with the
data (instead of creating it ahead of time and then loading up the test
data).

Working in PG 14.5, on a tiny 9M row table, in an idle database, I found:
- creating the btree index on the varchar(36) column to consistently take 7
*seconds*
- creating the hash index on the varchar(36) to consistently take 1 *hour*

I was surprised at how dramatically slower it was. I tried this on both
partitioned and non-partitioned tables (with the same data set) and in both
cases the timings came out similar.

I also tried creating a hash index on a varchar(100) column, also with 9M
rows. I gave up after it did not complete after several hours. (it wasn't
locked, just slow)

While I was experimenting with the different index types, I did some insert
tests. After putting the hash index on the column, the inserts were
significantly slower. The btree index was *6-7x *slower than no index, and
the hash index was *100x* slower than no index.

Assuming I can live with the slower inserts, is there any parameter in
particular I can tweak that would make the time it takes to create the hash
index closer to the btree index creation time? In particular if I wanted
to try this on a several billion row table in a busy database?

---

FWIW, from my tests on my laptop, on a 250M row table last weekend, after
100K selects:

MEAN (ms) | btree | hash
--------- | ------- | ----
varchar | 28.14916 | 27.03769
uuid | 27.04855 | 27.64424

and the sizes

SIZE | btree | hash
---- | ----- | ----
varchar | 12 GB | 6212 MB
uuid | 6595 MB | 6212 MB

- As long as the index fits in memory, varchar btree isn't really that
much slower in postgresql 14 (the way it was a few years ago), so we'll
probably just live with that for the forseeable future given the complexity
of changing things at the moment.

--
Rick

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Peter Geoghegan 2022-12-14 20:28:47 Re: creating hash indexes
Previous Message Jordan Hurwich 2022-12-14 18:43:45 Re: Increased iowait and blk_read_time with higher shared_buffers