Re: Deleting older versions in unique indexes to avoid page splits

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2024-11-08 00:31:12
Message-ID: 87r07mwoq7.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:

Hi Peter,

I think I understand the main idea now with your help, I'd like to
repeat my understanding for your double check.

> On Thu, Nov 7, 2024 at 3:05 AM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>> So my questions are: (a) How does the "logically unchanged index" hint
>> can be helpful for this purpose?
>
> It's the main trigger for bottom-up index deletion. It is taken as a
> signal that the leaf page might have garbage duplicates from non-HOT
> updates.

> What the hint *actually* indicates is that the incoming index tuple is
> a new, unchanged version duplicate. But it's reasonable to guess that
> this will also be true of existing tuples located on the same leaf
> page as the one that the new, incoming tuple will go on.

We triggered with this style(non-hot + unchanged index + update)
because:

(1). HOT UPDATE doesn't duplicate the index entry, so it should be
safely ignored.
(2). INSERT doesn't generate old tuples, so it should be safely
ignored.
(3). DELETE does generate new index entry, but we might not touch
the indexes at all during deletes (except the index we used for index
scan), so we have no chances to mark the "logically unchanged index"
hint.

With the (1) (2) (3), only non-hot UPDATE is considered.

Then why do we need "unchanged index" as a prerequisite? that is because it
is reasonable to guess that the "unchanged version" will be in the same
leaf page as the old one (Let's call it as leaf page A). Otherwise even
there are some old index entries, they are likely in a different leaf
page (leaf page B), and even they are deletable, but such deletes would
not be helpful for delay the current leaf page(A) from splitting. So it
is not considered now. AND since we doesn't touch leaf page B at all during
"changed index datum" update, so we have no chances to mark any hint on
page B, so we can do nothing for it.

> It's
> especially worth making that guess when the only alternative is to
> split the page (most of the time the hint isn't used at all, since the
> incoming item already fits on its leaf page).

OK.

>> (b). What does the latestRemovedXid
>> means, and which variable in code is used for this purpose. I searched
>> "latestRemovedXid" but nothing is found.
>
> That's because the symbol name changed a year or two after this commit
> went in. It is now snapshotConflictHorizon, but it has the same
> purpose as before (to generate conflicts on hot standys when needed).

OK. I'd try to understand why we need such snapshotConflictHorizon logic
later. So can I understand the relationship between
"snapshotConflictHorizon" and "(non-hot + unchanged index + update)" is
[OR], which means *either* of them is true, bottom-to-up deletes
will be triggered.

> The commit message points out that simple deletion inherently requires
> that we visit the heap to generate this snapshotConflictHorizon value
> for the WAL record. And so the cost of considering whether we can
> delete additional index tuples (not just those index tuples already
> marked LP_DEAD in the index leaf page) is very low. We were already
> visiting the heap pages that are used to check the "extra" index
> tuples (if we weren't then those index tuples wouldn't be checked at
> all).
OK.

> In short, since the *added* cost of checking extra related index
> tuples is very low, then it doesn't really matter if there are few or
> no extra tuples that actually turn out to be deletable. We can be
> aggressive because the cost of being wrong is hardly noticeable at
> all.
OK.

>> (c) What is the relationship
>> between a and b. I mean if we *have to visit" the same table blocks (in
>> the case of index-split?), then the IO-cost is paid anyway, do we still
>> need the "logically unchanged index hint"?
>
> Well, the commit message was written at a time when the only form of
> deletion was deletion of index tuples marked LP_DEAD (now called
> simple deletion). Prior to this work, we didn't attempt to delete any
> extra tuples in passing during simple deletion. It is easy to justify
> checking those extra index tuples/batching up work in that context.
OK.

> It is a little harder to justify bottom-up index deletion (or was at
> the time), because it will do work that is totally speculative -- the
> entire process might not even delete a single index tuple.

I understand this as some activities blocked OldestXmin to
advacne. e.g. long transactions.

> It is
> important that we give up quickly when it isn't possible to delete a
> single index tuple. We're better off falling back on nbtree index
> deduplication. That way another bottom-up index deletion pass over the
> same leaf page may succeed in the future (or may never be required).
OK.

Thanks for your answering!

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-11-08 00:38:45 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message Michael Paquier 2024-11-08 00:01:20 Re: general purpose array_sort