From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Next Steps with Hash Indexes |
Date: | 2021-08-12 15:57:58 |
Message-ID: | CAH2-WzkMTgytnQtJ4NKchJrLRNx44951OxtrhoMpjbak+oNbUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 11, 2021 at 8:51 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> (Standard disclaimer that I'm not qualified to design index AMs) I've seen one mention in the literature about the possibility of simply having a btree index over the hash values. That would require faster search within pages, in particular using abbreviated keys in the ItemId array of internal pages [1] and interpolated search rather than pure binary search (which should work reliably with high-entropy keys like hash values), but doing that would speed up all btree indexes, so that much is worth doing regardless of how hash indexes are implemented. In that scheme, the hash index AM would just be around for backward compatibility.
I think that it's possible (though hard) to do that without involving
hashing, even for datatypes like text. Having some kind of prefix
compression that makes the final abbreviated keys have high entropy
would be essential, though. I agree that it would probably be
significantly easier when you knew you were dealing with hash values,
but even there you need some kind of prefix compression.
In any case I suspect that it would make sense to reimplement hash
indexes as a translation layer between hash index opclasses and
nbtree. Robert said "Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance". Obviously we've
already done a lot of work on an index AM that deals with a fully
ordered keyspace very well. This includes dealing with large groups of
duplicates gracefully, since in a certain sense there are no duplicate
B-Tree index tuples -- the heap TID tiebreaker ensures that. And it
ensures that you have heap-wise locality within these large groups,
which is a key enabler of things like opportunistic index deletion.
When hash indexes have been used in database systems, it tends to be
in-memory database systems where the recovery path doesn't recover
indexes -- they're just rebuilt from scratch instead. If that's
already a baked-in assumption then hash indexes make more sense. To me
it seems like the problem with true hash indexes is that they're
constructed in a top-down fashion, which is approximately the opposite
of the bottom-up, incremental approach used by B-Tree indexing. This
seems to be where all the skew problems arise from. This approach
cannot be robust to changes in the keyspace over time, really.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2021-08-12 16:01:44 | Re: Avoid stuck of pbgench due to skipped transactions |
Previous Message | Greg Sabino Mullane | 2021-08-12 15:45:53 | Re: Add option --drop-cascade for pg_dump/restore |