From: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Sadhuprasad Patro <b(dot)sadhu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Next Steps with Hash Indexes |
Date: | 2021-10-05 10:38:02 |
Message-ID: | CANbhV-GhTaDedzG-+zBDhMgX1eLSJeyP+XrySoPuHKMp1p9Fow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 27 Sept 2021 at 06:52, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro <b(dot)sadhu(at)gmail(dot)com> wrote:
> > >
> > > And to get the multi-column hash index selected, we may set
> > > enable_hashjoin =off, to avoid any condition become join condition,
> > > saw similar behaviors in other DBs as well...
> >
> > This may be related to Tom's point that, if some of the quals are
> > removed due to optimization or converted to join quals, then now, even
> > if the user has given qual on all the key columns the index scan will
> > not be selected because we will be forcing that the hash index can
> > only be selected if it has quals on all the key attributes?
> >
> > I don't think suggesting enable_hashjoin =off is a solution,
> >
>
> Yeah, this doesn't sound like a good idea. How about instead try to
> explore the idea where the hash (bucket assignment and search) will be
> based on the first index key and the other columns will be stored as
> payload? I think this might pose some difficulty in the consecutive
> patch to enable a unique index because it will increase the chance of
> traversing more buckets for uniqueness checks. If we see such
> problems, then I have another idea to minimize the number of buckets
> that we need to lock during uniqueness check which is by lock chaining
> as is used during hash bucket clean up where at a time we don't need
> to lock more than two buckets at a time.
I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.
Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.
I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.
--
Simon Riggs http://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Amul Sul | 2021-10-05 10:41:58 | Re: [Patch] ALTER SYSTEM READ ONLY |
Previous Message | Bharath Rupireddy | 2021-10-05 10:35:04 | Re: Fix pg_log_backend_memory_contexts() 's delay |