pgsql: Fix corruption when relation truncation fails.

From: Thomas Munro <tmunro(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Fix corruption when relation truncation fails.
Date: 2024-12-20 11:01:03
Message-ID: E1tOakg-000VgQ-Ke@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix corruption when relation truncation fails.

RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:

1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.

Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs. All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done. When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it). When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).

Changes:

* WAL is now unconditionally flushed first
* smgrtruncate() is now called in a critical section, preventing
interrupts and causing PANIC on file API failure
* smgrtruncate() has a new parameter for existing fork sizes,
because it can't call smgrnblocks() itself inside a critical section

The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map(). That last is also brought up to date with
other evolutions of the truncation protocol.

The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch. The more recently invented cancellation bug was
diagnosed by Alexander Lakhin. Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.

Back-patch to all supported releases.

Reviewed-by: Michael Paquier <michael(at)paquier(dot)xyz>
Reviewed-by: Robert Haas <robertmhaas(at)gmail(dot)com>
Reported-by: rootcause000(at)gmail(dot)com
Reported-by: Alexander Lakhin <exclusion(at)gmail(dot)com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org

Branch
------
REL_13_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/2280912165d62a8b1de477818a405a76ffc66b2e

Modified Files
--------------
contrib/pg_visibility/pg_visibility.c | 32 ++++++++++++++++++++-----
src/backend/catalog/storage.c | 44 +++++++++++++++++++++++++----------
src/backend/storage/smgr/md.c | 28 +++++++++++++++-------
src/backend/storage/smgr/smgr.c | 14 +++++++----
src/include/storage/md.h | 2 +-
src/include/storage/smgr.h | 5 ++--
6 files changed, 92 insertions(+), 33 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Melanie Plageman 2024-12-20 14:44:26 pgsql: Fix overflow danger in SampleHeapTupleVisible(), take 2
Previous Message Thomas Munro 2024-12-20 11:00:54 pgsql: Fix corruption when relation truncation fails.