From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Deleting older versions in unique indexes to avoid page splits |
Date: | 2024-11-07 14:21:09 |
Message-ID: | CAH2-WzkHF1VzZOOw2N-pc1q4kdeH7+62SwdqvbCVHeiB_radwg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Andy,
On Thu, Nov 7, 2024 at 3:05 AM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> So my questions are: (a) How does the "logically unchanged index" hint
> can be helpful for this purpose?
It's the main trigger for bottom-up index deletion. It is taken as a
signal that the leaf page might have garbage duplicates from non-HOT
updates.
What the hint *actually* indicates is that the incoming index tuple is
a new, unchanged version duplicate. But it's reasonable to guess that
this will also be true of existing tuples located on the same leaf
page as the one that the new, incoming tuple will go on. It's
especially worth making that guess when the only alternative is to
split the page (most of the time the hint isn't used at all, since the
incoming item already fits on its leaf page).
> (b). What does the latestRemovedXid
> means, and which variable in code is used for this purpose. I searched
> "latestRemovedXid" but nothing is found.
That's because the symbol name changed a year or two after this commit
went in. It is now snapshotConflictHorizon, but it has the same
purpose as before (to generate conflicts on hot standys when needed).
The commit message points out that simple deletion inherently requires
that we visit the heap to generate this snapshotConflictHorizon value
for the WAL record. And so the cost of considering whether we can
delete additional index tuples (not just those index tuples already
marked LP_DEAD in the index leaf page) is very low. We were already
visiting the heap pages that are used to check the "extra" index
tuples (if we weren't then those index tuples wouldn't be checked at
all).
In short, since the *added* cost of checking extra related index
tuples is very low, then it doesn't really matter if there are few or
no extra tuples that actually turn out to be deletable. We can be
aggressive because the cost of being wrong is hardly noticeable at
all.
> (c) What is the relationship
> between a and b. I mean if we *have to visit" the same table blocks (in
> the case of index-split?), then the IO-cost is paid anyway, do we still
> need the "logically unchanged index hint"?
Well, the commit message was written at a time when the only form of
deletion was deletion of index tuples marked LP_DEAD (now called
simple deletion). Prior to this work, we didn't attempt to delete any
extra tuples in passing during simple deletion. It is easy to justify
checking those extra index tuples/batching up work in that context.
It is a little harder to justify bottom-up index deletion (or was at
the time), because it will do work that is totally speculative -- the
entire process might not even delete a single index tuple. It is
important that we give up quickly when it isn't possible to delete a
single index tuple. We're better off falling back on nbtree index
deduplication. That way another bottom-up index deletion pass over the
same leaf page may succeed in the future (or may never be required).
> At last, appreciated for your effort on making this part much better!
Thanks
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Junwang Zhao | 2024-11-07 14:26:35 | Re: general purpose array_sort |
Previous Message | Junwang Zhao | 2024-11-07 14:06:04 | Re: general purpose array_sort |