Quick Links

Re: Next Steps with Hash Indexes

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Next Steps with Hash Indexes
Date:	2021-08-11 14:54:09
Message-ID:	CA+TgmoYVAxE0PGdO9aDBj=pWNdkXbJHr5Udw5RHO+9j3e1=eDQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 11, 2021 at 10:30 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I suspect it would be hard to store multiple hash values, one per
> > column. It seems to me that what we ought to do is combine the hash
> > values for the individual columns using hash_combine(64) and store the
> > combined value. I can't really imagine why we would NOT do that.
>
> That would make it impossible to use the index except with queries
> that provide equality conditions on all the index columns. Maybe
> that's fine, but it seems less flexible than other possible definitions.
> It really makes me wonder why anyone would bother with a multicol
> hash index.

Hmm. That is a point I hadn't considered.

I have to admit that after working with Amit on all the work to make
hash indexes WAL-logged a few years ago, I was somewhat disillusioned
with the whole AM. It seems like a cool idea to me but it's just not
that well-implemented. For example, the strategy of just doubling the
number of buckets in one shot seems pretty terrible for large indexes,
and ea69a0dead5128c421140dc53fac165ba4af8520 will buy only a limited
amount of relief. Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance and really bad for
implementing unique indexes with reasonable amounts of locking. (I
don't know how the present patch tries to solve that problem.) It's
tempting to think that we should think about creating something
altogether new instead of hacking on the existing implementation, but
that's a lot of work and I'm not sure what specific design would be
best.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Next Steps with Hash Indexes at 2021-08-11 14:30:04 from Tom Lane

Responses

Re: Next Steps with Hash Indexes at 2021-08-11 15:17:57 from Tom Lane
Re: Next Steps with Hash Indexes at 2021-08-11 15:51:00 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2021-08-11 15:13:34	Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message	Mark Dilger	2021-08-11 14:51:36	Re: Use extended statistics to estimate (Var op Var) clauses