Quick Links

Re: Next Steps with Hash Indexes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Next Steps with Hash Indexes
Date:	2021-08-11 15:17:57
Message-ID:	4005248.1628695077@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I have to admit that after working with Amit on all the work to make
> hash indexes WAL-logged a few years ago, I was somewhat disillusioned
> with the whole AM. It seems like a cool idea to me but it's just not
> that well-implemented.

Yeah, agreed. The whole buckets-are-integral-numbers-of-pages scheme
is pretty well designed to ensure bloat, but trying to ameliorate that
by reducing the number of buckets creates its own problems (since, as
you mention, we have no scheme whatever for searching within a bucket).
I'm quite unimpressed with Simon's upthread proposal to turn off bucket
splitting without doing anything about the latter issue.

I feel like we'd be best off to burn the AM to the ground and start
over. I do not know what a better design would look like exactly,
but I feel like it's got to decouple buckets from pages somehow.
Along the way, I'd want to store 64-bit hash values (we still haven't
done that have we?).

As far as the specific point at hand is concerned, I think storing
a hash value per index column, while using only the first column's
hash for bucket selection, is what to do for multicol indexes.
We still couldn't set amoptionalkey=true for hash indexes, because
without a hash for the first column we don't know which bucket to
look in. But storing hashes for the additional columns would allow
us to check additional conditions in the index, and usually save
trips to the heap on queries that provide additional column
conditions. You could also imagine sorting the contents of a bucket
on all the hashes, which would ease uniqueness checks.

regards, tom lane

In response to

Re: Next Steps with Hash Indexes at 2021-08-11 14:54:09 from Robert Haas

Responses

Re: Next Steps with Hash Indexes at 2021-08-11 16:39:51 from Robert Haas
Re: Next Steps with Hash Indexes at 2021-08-12 03:39:31 from Dilip Kumar
Re: Next Steps with Hash Indexes at 2021-08-12 04:22:18 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2021-08-11 15:18:44	Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message	Mark Dilger	2021-08-11 15:17:11	Re: Use extended statistics to estimate (Var op Var) clauses