Re: Next Steps with Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Next Steps with Hash Indexes
Date: 2021-08-11 14:54:09
Message-ID: CA+TgmoYVAxE0PGdO9aDBj=pWNdkXbJHr5Udw5RHO+9j3e1=eDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 11, 2021 at 10:30 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I suspect it would be hard to store multiple hash values, one per
> > column. It seems to me that what we ought to do is combine the hash
> > values for the individual columns using hash_combine(64) and store the
> > combined value. I can't really imagine why we would NOT do that.
>
> That would make it impossible to use the index except with queries
> that provide equality conditions on all the index columns. Maybe
> that's fine, but it seems less flexible than other possible definitions.
> It really makes me wonder why anyone would bother with a multicol
> hash index.

Hmm. That is a point I hadn't considered.

I have to admit that after working with Amit on all the work to make
hash indexes WAL-logged a few years ago, I was somewhat disillusioned
with the whole AM. It seems like a cool idea to me but it's just not
that well-implemented. For example, the strategy of just doubling the
number of buckets in one shot seems pretty terrible for large indexes,
and ea69a0dead5128c421140dc53fac165ba4af8520 will buy only a limited
amount of relief. Likewise, the fact that keys are stored in hash
value order within pages but that the bucket as a whole is not kept in
order seems like it's bad for search performance and really bad for
implementing unique indexes with reasonable amounts of locking. (I
don't know how the present patch tries to solve that problem.) It's
tempting to think that we should think about creating something
altogether new instead of hacking on the existing implementation, but
that's a lot of work and I'm not sure what specific design would be
best.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-08-11 15:13:34 Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message Mark Dilger 2021-08-11 14:51:36 Re: Use extended statistics to estimate (Var op Var) clauses