From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Christensen <david(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Improving btree performance through specializing by key shape, take 2 |
Date: | 2023-09-19 13:28:26 |
Message-ID: | CAEze2Wi8=4FZHSn0e8orOjEmj01Ugf4heASUGKcnMURPEDwn8A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 19 Sept 2023 at 03:56, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Sep 18, 2023 at 6:29 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I also have significant doubts about your scheme for avoiding
> > invalidating the bounds of the page based on its high key matching the
> > parent's separator. The subtle dynamic prefix compression race
> > condition that I was worried about was one caused by page deletion.
> > But page deletion doesn't change the high key at all (it does that for
> > the deleted page, but that's hardly relevant). So how could checking
> > the high key possibly help?
>
> To be clear, page deletion does what I described here (it does an
> in-place update of the downlink to the deleted page, so the same pivot
> tuple now points to its right sibling, which is our page of concern),
> in addition to fully removing the original pivot tuple whose downlink
> originally pointed to our page of concern. This is why page deletion
> makes the key space "move to the right", very much like a page split
> would.
I am still aware of this issue, and I think we've discussed it in
detail earlier. I think it does not really impact this patchset. Sure,
I can't use dynamic prefix compression to its full potential, but I
still do get serious performance benefits:
FULL KEY _bt_compare calls:
'Optimal' full-tree DPT: average O(3)
Paged DPT (this patch): average O(2 * height)
... without HK opt: average O(3 * height)
Current: O(log2(n))
Single-attribute compares:
'Optimal' full-tree DPT: O(log(N))
Paged DPT (this patch): O(log(N))
Current: 0 (or, O(log(N) * natts))
So, in effect, this patch moves most compare operations to the level
of only one or two full key compare operations per page (on average).
I use "on average": on a sorted array with values ranging from
potentially minus infinity to positive infinity, it takes on average 3
compares before a binary search can determine the bounds of the
keyspace it has still to search. If one side's bounds is already
known, it takes on average 2 compare operations before these bounds
are known.
> IMV it would be better if it made the key space "move to the left"
> instead, which would make page deletion close to the exact opposite of
> a page split -- that's what the Lanin & Shasha paper does (sort of).
> If you have this symmetry, then things like dynamic prefix compression
> are a lot simpler.
>
> ISTM that the only way that a scheme like yours could work, assuming
> that making page deletion closer to Lanin & Shasha is not going to
> happen, is something even more invasive than that: it might work if
> you had a page low key (not just a high key) on every page.
Note that the "dynamic prefix compression" is currently only active on
the page level.
True, the patch does carry over _bt_compare's prefix result for the
high key on the child page, but we do that only if the highkey is
actually an exact copy of the right separator on the parent page. This
carry-over opportunity is extremely likely to happen, because the high
key generated in _bt_split is then later inserted on the parent page.
The only case where it could differ is in concurrent page deletions.
That is thus a case of betting a few cycles to commonly save many
cycles (memcmp vs _bt_compare full key compare.
Again, we do not actually skip a prefix on the compare call of the
P_HIGHKEY tuple, nor for the compares of the midpoints unless we've
found a tuple on the page that compares as smaller than the search
key.
> You'd have
> to compare the lower bound separator key from the parent (which might
> itself be the page-level low key for the parent) to the page low key.
> That's not a serious suggestion; I'm just pointing out that you need
> to be able to compare like with like for a canary condition like this
> one, and AFAICT there is no lightweight practical way of doing that
> that is 100% robust.
True, if we had consistent LOWKEYs on pages, that'd make this job much
easier: the prefix could indeed be carried over in full. But that's
not currently the case for the nbtree code, and this is the next best
thing, as it also has the benefit of working with all currently
supported physical formats of btree indexes.
Kind regards,
Matthias van de Meent
From | Date | Subject | |
---|---|---|---|
Next Message | torikoshia | 2023-09-19 13:39:59 | Re: RFC: Logging plan of the running query |
Previous Message | Peter Eisentraut | 2023-09-19 13:11:51 | Re: tablecmds.c/MergeAttributes() cleanup |