BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Date: 2024-04-10 11:00:00
Message-ID: 18426-2d18da6586f152d6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18426
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 16.2
Operating system: Ubuntu 22.04
Description:

When running test 027_stream_regress on a rather slow ARMv7 device, I
came upon the following test failure:

[21:31:30.789](489.962s) # poll_query_until timed out executing this
query:
# SELECT '0/16952DE0' <= replay_lsn AND state = 'streaming'
# FROM pg_catalog.pg_stat_replication
# WHERE application_name IN ('standby_1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
timed out waiting for catchup at t/027_stream_regress.pl line 100.

log/027_stream_regress_standby_1.log ends with:
...
2024-04-04 21:04:01.946 UTC [17903] LOG: recovery restart point at
0/11C71430
2024-04-04 21:04:01.946 UTC [17903] DETAIL: Last completed transaction was
at log time 2024-04-04 21:02:19.456471+03.
2024-04-04 21:06:16.014 UTC [17905] WARNING: page 7 of relation
base/16384/2606 does not exist
2024-04-04 21:06:16.014 UTC [17905] CONTEXT: WAL redo at 0/13841DE8 for
Heap2/PRUNE: snapshotConflictHorizon: 13050, nredirected: 0, ndead: 14,
nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5, 6, 28, 31, 32, 33, 34, 35,
36, 37], unused: [7]; blkref #0: rel 1663/16384/2606, blk 7
2024-04-04 21:06:16.014 UTC [17905] PANIC: WAL contains references to
invalid pages
2024-04-04 21:06:16.014 UTC [17905] CONTEXT: WAL redo at 0/13841DE8 for
Heap2/PRUNE: snapshotConflictHorizon: 13050, nredirected: 0, ndead: 14,
nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5, 6, 28, 31, 32, 33, 34, 35,
36, 37], unused: [7]; blkref #0: rel 1663/16384/2606, blk 7
2024-04-04 21:06:16.927 UTC [17888] LOG: startup process (PID 17905) was
terminated by signal 6: Aborted

t_027_stream_regress_standby_1_data/pgdata/pg_wal/000000010000000000000013
contains:
...
rmgr: XLOG len (rec/tot): 49/ 8241, tx: 13103, lsn:
0/137C3538, prev 0/137C14E8, desc: FPI , blkref #0: rel 1663/16384/2606 fork
vm blk 0 FPW
rmgr: Storage len (rec/tot): 46/ 46, tx: 13103, lsn:
0/137C5588, prev 0/137C3538, desc: TRUNCATE base/16384/2606 to 7 blocks
flags 7
...
rmgr: Transaction len (rec/tot): 38/ 38, tx: 13103, lsn:
0/137C7588, prev 0/137C7540, desc: ABORT 2024-04-04 21:05:31.176869 UTC
### That is, the relation 2606 (pg_constraint) was truncated in
### transaction 13103, and then the transaction was aborted.
...
rmgr: Heap2 len (rec/tot): 85/ 85, tx: 0, lsn:
0/13841DE8, prev 0/13841DA8, desc: PRUNE snapshotConflictHorizon: 13050,
nredirected: 0, ndead: 14, nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5,
6, 28, 31, 32, 33, 34, 35, 36, 37], unused: [7], blkref #0: rel
1663/16384/2606 blk 7
### The record that standby could not replay because it did truncate
### pg_constraint when replaying 0/137C5588.

027_stream_regress_primary.log contains:
2024-04-04 21:05:31.176 UTC [29329] ERROR: canceling autovacuum task
2024-04-04 21:05:31.176 UTC [29329] CONTEXT: while truncating relation
"pg_catalog.pg_constraint" to 7 blocks
automatic vacuum of table "regression.pg_catalog.pg_constraint"

As far as I can see, the TRUNCATE record was WAL-logged by the autovacuum
process in table_relation_vacuum() -> heap_vacuum_rel() ->
lazy_truncate_heap() -> RelationTruncate(), and then that process was
interrupted inside RelationTruncate() -> smgrtruncate() ->
DropRelationBuffers() -> InvalidateBuffer() -> WaitIO().

A demo test for the issue to follow...

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-04-10 14:16:41 Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF
Previous Message PG Bug reporting form 2024-04-10 10:06:40 BUG #18425: KB5036892 Microsoft Update Package Issue