From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Victor Yegorov <vyegorov(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Deleting older versions in unique indexes to avoid page splits |
Date: | 2020-11-25 18:41:15 |
Message-ID: | CAH2-Wz=ByoVLy0Vn_dcPebxgSV2ocrqo5zZZuyxWM71s7k=6OA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 25, 2020 at 4:43 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
>> Then I had a much better idea: Make the existing LP_DEAD stuff a
>> little more like bottom-up index deletion. We usually have to access
>> heap blocks that the index tuples point to today, in order to have a
>> latestRemovedXid cutoff (to generate recovery conflicts). It's worth
>> scanning the leaf page for index tuples with TIDs whose heap block
>> matches the index tuples that actually have their LP_DEAD bits set.
>> This only consumes a few more CPU cycles. We don't have to access any
>> more heap blocks to try these extra TIDs, so it seems like a very good
>> idea to try them out.
>
>
> I don't seem to understand this.
>
> Is it: we're scanning the leaf page for all LP_DEAD tuples that point to the same
> heap block? Which heap block we're talking about here, the one that holds
> entry we're about to add (the one that triggered bottom-up-deletion due to lack
> of space I mean)?
No, the incoming tuple isn't significant.
As you know, bottom-up index deletion uses heuristics that are
concerned with duplicates on the page, and the "logically unchanged by
an UPDATE" hint that the executor passes to btinsert(). Bottom-up
deletion runs when all LP_DEAD bits have been cleared (either because
there never were any LP_DEAD bits set, or because they were set and
then deleted, which wasn't enough).
But before bottom-up deletion may run, traditional deletion of LP_DEAD
index tuples runs -- this is always our preferred strategy because
index tuples with their LP_DEAD bits set are already known to be
deletable. We can make this existing process (which has been around
since PostgreSQL 8.2) better by applying similar principles.
We have promising tuples for bottom-up deletion. Why not have
"promising heap blocks" for traditional LP_DEAD index tuple deletion?
Or if you prefer, we can consider index tuples that *don't* have their
LP_DEAD bits set already but happen to point to the *same heap block*
as other tuples that *do* have their LP_DEAD bits set promising. (The
tuples with their LP_DEAD bits set are not just promising -- they're
already a sure thing.)
This means that traditional LP_DEAD deletion is now slightly more
speculative in one way (it speculates about what is likely to be true
using heuristics). But it's much less speculative than bottom-up index
deletion. We are required to visit these heap blocks anyway, since a
call to _bt_delitems_delete() for LP_DEAD deletion must already call
table_compute_xid_horizon_for_tuples(), which has to access the blocks
to get a latestRemovedXid for the WAL record.
The only thing that we have to lose here is a few CPU cycles to find
extra TIDs to consider. We'll visit exactly the same number of heap
blocks as before. (Actually, _bt_delitems_delete() does not have to do
that in all cases, actually, but it has to do it with a logged table
with wal_level >= replica, which is the vast majority of cases in
practice.)
This means that traditional LP_DEAD deletion reuses some of the
bottom-up index deletion infrastructure. So maybe nbtree never calls
table_compute_xid_horizon_for_tuples() now, since everything goes
through the new heapam stuff instead (which knows how to check extra
TIDs that might not be dead at all).
> I am missing a general perspective here.
>
> Is it true, that despite the long (vacuum preventing) transaction we can re-use space,
> as after the DELETE statements commits, IndexScans are setting LP_DEAD hints after
> they check the state of the corresponding heap tuple?
The enhancement to traditional LP_DEAD deletion that I just described
does not affect the current restrictions on setting LP_DEAD bits in
the presence of a long-running transaction, or anything like that.
That seems like an unrelated project. The value of this enhancement is
purely its ability to delete *extra* index tuples that could have had
their LP_DEAD bits set already (it was possible in principle), but
didn't. And only when they are nearby to index tuples that really do
have their LP_DEAD bits set.
> I haven't done any testing so far since sending my last e-mail.
> If you'll have a chance to send a new v10 version with LP_DEAD-deletion-with-extra-TIDs thing,
> I will do some tests (planned).
Thanks! I think that it will be next week. It's a relatively big change.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-11-25 18:45:46 | Re: walsender bug: stuck during shutdown |
Previous Message | Dean Rasheed | 2020-11-25 18:25:27 | Re: proposal: possibility to read dumped table's name from file |