pgsql: Avoid killing btree items that are already dead

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Avoid killing btree items that are already dead
Date: 2020-05-15 20:51:55
Message-ID: E1jZhJD-0000QS-Fs@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Avoid killing btree items that are already dead

_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences. However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page. Multiple concurrent processes would write the same page several
times, leading to WAL bloat. The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.

The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.

Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.

Author: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/09f2752b0465f4df5b14ea5cf4a8c142aa694d65

Modified Files
--------------
src/backend/access/nbtree/nbtutils.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2020-05-15 22:09:06 pgsql: Add comments linking pg_strftime to timestamptz_to_str
Previous Message Tom Lane 2020-05-15 18:28:55 pgsql: Rename SLRU structures and associated LWLocks.