From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash index build performance tweak from sorting |
Date: | 2022-05-04 10:27:34 |
Message-ID: | CAA4eK1JiCB85JPe-HT6hA9qEZe4HO1FDAJxJSAyVcptTg0wRHA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 2, 2022 at 9:28 PM Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
>
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs
> > <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> > >
> > > Hash index pages are stored in sorted order, but we don't prepare the
> > > data correctly.
> > >
> > > We sort the data as the first step of a hash index build, but we
> > > forget to sort the data by hash as well as by hash bucket.
> > >
> >
> > I was looking into the nearby comments (Fetch hash keys and mask off
> > bits we don't want to sort by.) and it sounds like we purposefully
> > don't want to sort by the hash key. I see that this comment was
> > originally introduced in the below commit:
> >
> > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b
> > Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> > Date: Mon Sep 15 18:43:41 2008 +0000
> >
> > Change hash indexes to store only the hash code rather than the
> > whole indexed
> > value.
> >
> > But even before that, we seem to mask off the bits before comparison.
> > Is it that we are doing so because we want to keep the order of hash
> > keys in a particular bucket so such masking was required?
>
> We need to sort by both hash bucket and hash value.
>
> Hash bucket id so we can identify the correct hash bucket to insert into.
>
> But then on each bucket/overflow page we store it sorted by hash value
> to make lookup faster, so inserts go faster if they are also sorted.
>
I also think so. So, we should go with this unless someone else sees
any flaw here.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2022-05-04 10:54:57 | Add a new function and a document page to get/show all the server hooks |
Previous Message | Amit Kapila | 2022-05-04 10:18:47 | Re: Logical replication timeout problem |