pgsql: Deprecate nbtree's BTP_HAS_GARBAGE flag.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Deprecate nbtree's BTP_HAS_GARBAGE flag.
Date: 2020-11-17 17:46:56
Message-ID: E1kf54C-0005D3-2R@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Deprecate nbtree's BTP_HAS_GARBAGE flag.

Streamline handling of the various strategies that we have to avoid a
page split in nbtinsert.c. When it looks like a leaf page is about to
overflow, we now perform deleting LP_DEAD items and deduplication in one
central place. This greatly simplifies _bt_findinsertloc().

This has an independently useful consequence: nbtree no longer relies on
the BTP_HAS_GARBAGE page level flag/hint for anything important. We
still set and unset the flag in the same way as before, but it's no
longer treated as a gating condition when considering if we should check
for already-set LP_DEAD bits. This happens at the point where the page
looks like it might have to be split anyway, so simply checking the
LP_DEAD bits in passing is practically free. This avoids missing
LP_DEAD bits just because the page-level hint is unset, which is
probably reasonably common (e.g. it happens when VACUUM unsets the
page-level flag without actually removing index tuples whose LP_DEAD-bit
was set recently, after the VACUUM operation began but before it reached
the leaf page in question).

Note that this isn't a big behavioral change compared to PostgreSQL 13.
We were already checking for set LP_DEAD bits regardless of whether the
BTP_HAS_GARBAGE page level flag was set before we considered doing a
deduplication pass. This commit only goes slightly further by doing the
same check for all indexes, even indexes where deduplication won't be
performed.

We don't completely remove the BTP_HAS_GARBAGE flag. We still rely on
it as a gating condition with pg_upgrade'd indexes from before B-tree
version 4/PostgreSQL 12. That makes sense because we sometimes have to
make a choice among pages full of duplicates when inserting a tuple with
pre version 4 indexes. It probably still pays to avoid accessing the
line pointer array of a page there, since it won't yet be clear whether
we'll insert on to the page in question at all, let alone split it as a
result.

Author: Peter Geoghegan <pg(at)bowt(dot)ie>
Reviewed-By: Victor Yegorov <vyegorov(at)gmail(dot)com>
Discussion: https://postgr.es/m/CAH2-Wz%3DYpc1PDdk8OVJDChGJBjT06%3DA0Mbv9HyTLCsOknGcUFg%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cf2acaf4dcb5e20204dcec4d698cb4478af533e7

Modified Files
--------------
src/backend/access/nbtree/nbtdedup.c | 76 ++++--------------
src/backend/access/nbtree/nbtinsert.c | 142 +++++++++++++++++++++++-----------
src/backend/access/nbtree/nbtpage.c | 33 ++++----
src/backend/access/nbtree/nbtutils.c | 3 +-
src/backend/access/nbtree/nbtxlog.c | 3 +-
src/include/access/nbtree.h | 8 +-
6 files changed, 135 insertions(+), 130 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2020-11-18 05:02:30 pgsql: Add tab completion for CREATE [OR REPLACE] TRIGGER in psql
Previous Message Alvaro Herrera 2020-11-17 17:23:45 pgsql: indexcmds.c: reorder function prototypes