BUG #18811: PANIC,XX000,"WAL contains references to invalid pages"

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: bungina(at)gmail(dot)com
Subject: BUG #18811: PANIC,XX000,"WAL contains references to invalid pages"
Date: 2025-02-13 12:19:18
Message-ID: 18811-dbd06bbde2609075@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18811
Logged by: Polina Bungina
Email address: bungina(at)gmail(dot)com
PostgreSQL version: 16.6
Operating system: Ubuntu 22.04
Description:

We have just encountered this problem the second time within a month.

Standby starts to panic after the following sequence of events:

[primary]

2025-02-12 06:22:44.160 UTC,,,3268046,,67ac3e15.31ddce,63,,2025-02-12
06:22:13 UTC,614/47144,1969937340,ERROR,57014,"canceling autovacuum
task",,,,,"while truncating relation ""data.slice_trigger_13"" to 0 blocks
automatic vacuum of table ""db1.data.slice_trigger_13""",,,,"","autovacuum
worker",,0
2025-02-12 06:22:44.160
UTC,"user_app","db1",3263558,"10.2.30.104:42868",67ac3c0c.31cc46,75,"BIND
waiting",2025-02-12 06:13:32 UTC,412/2708655,1969937332,LOG,00000,"process
3263558 acquired RowExclusiveLock on relation 35182 of database 16710 after
1184.572 ms",,,,,,"delete from ""data"".""slice_trigger"" where
""data"".""slice_trigger"".""st_id"" in ($1)",,,"PostgreSQL JDBC
Driver","client backend",,-7230379448131803312
2025-02-12 06:22:44.160
UTC,"user_app","db1",3263678,"10.2.30.104:56256",67ac3c1a.31ccbe,55,"PARSE
waiting",2025-02-12 06:13:46 UTC,427/2212958,0,LOG,00000,"process 3263678
acquired AccessShareLock on relation 35182 of database 16710 after 1042.391
ms",,,,,,"with ""first_select"" as (select * from data.slice_trigger_13
order by ""st_occurred_at"" limit $1), ""windowed_select"" as (select rank()
over (partition by ""st_unit_identifier"", ""st_reservation_identifier""
order by ""st_occurred_at"", ""st_id"") as ""row_rank"", * from
first_select) select * from windowed_select where ""row_rank"" =
$2",39,,"PostgreSQL JDBC Driver","client backend",,0
2025-02-12 06:22:44.160
UTC,"user_app","db1",3263678,"10.2.30.104:56256",67ac3c1a.31ccbe,56,"PARSE",2025-02-12
06:13:46 UTC,427/2212958,0,LOG,00000,"duration: 1042.822 ms parse
<unnamed>: with ""first_select"" as (select * from data.slice_trigger_13
order by ""st_occurred_at"" limit $1), ""windowed_select"" as (select rank()
over (partition by ""st_unit_identifier"", ""st_reservation_identifier""
order by ""st_occurred_at"", ""st_id"") as ""row_rank"", * from
first_select) select * from windowed_select where ""row_rank"" =
$2",,,,,,,,,"PostgreSQL JDBC Driver","client
backend",,-4561709464811390503

[replica]

2025-02-12 06:23:25.302 UTC,,,413151,,6790b32a.64ddf,184,,2025-01-22
08:58:18 UTC,1/0,0,WARNING,01000,"page 1 of relation base/16710/35182 is
uninitialized",,,,,"WAL redo at 1732/47969908 for Heap2/VISIBLE:
snapshotConflictHorizon: 0, flags: 0x03; blkref #0: rel 1663/16710/35182,
fork 2, blk 0 FPW; blkref #1: rel 1663/16710/35182, blk
1",,,,"","startup",,0
2025-02-12 06:23:25.302 UTC,,,413151,,6790b32a.64ddf,185,,2025-01-22
08:58:18 UTC,1/0,0,PANIC,XX000,"WAL contains references to invalid
pages",,,,,"WAL redo at 1732/47969908 for Heap2/VISIBLE:
snapshotConflictHorizon: 0, flags: 0x03; blkref #0: rel 1663/16710/35182,
fork 2, blk 0 FPW; blkref #1: rel 1663/16710/35182, blk
1",,,,"","startup",,0
2025-02-12 06:23:26.233 UTC,,,413146,,6790b32a.64dda,7,,2025-01-22 08:58:18
UTC,,0,LOG,00000,"startup process (PID 413151) was terminated by signal 6:
Aborted",,,,,,,,,"","postmaster",,0

[wal records for the problematic table around 1732/47969908]

rmgr: Heap len (rec/tot): 54/ 54, tx: 1969939280, lsn:
1732/47306DF0, prev 1732/47306DC8, desc: DELETE xmax: 1969939280, off: 1,
infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16710/35182 blk
21
rmgr: Heap len (rec/tot): 54/ 54, tx: 1969939280, lsn:
1732/47306E28, prev 1732/47306DF0, desc: DELETE xmax: 1969939280, off: 2,
infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16710/35182 blk
21
rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939315, lsn:
1732/4738B258, prev 1732/4738B230, desc: INSERT off: 7, flags: 0x00, blkref
#0: rel 1663/16710/35182 blk 21
rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939352, lsn:
1732/4749E4E0, prev 1732/4749E488, desc: INSERT off: 8, flags: 0x00, blkref
#0: rel 1663/16710/35182 blk 21
rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939352, lsn:
1732/474BF8F0, prev 1732/474BF898, desc: INSERT off: 9, flags: 0x00, blkref
#0: rel 1663/16710/35182 blk 21
rmgr: Heap2 len (rec/tot): 64/ 186, tx: 0, lsn:
1732/47969908, prev 1732/479698C8, desc: VISIBLE snapshotConflictHorizon: 0,
flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0 FPW, blkref #1:
rel 1663/16710/35182 blk 1
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn:
1732/4796A440, prev 1732/4796A3E8, desc: VISIBLE snapshotConflictHorizon: 0,
flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel
1663/16710/35182 blk 2
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn:
1732/4796A480, prev 1732/4796A440, desc: VISIBLE snapshotConflictHorizon: 0,
flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel
1663/16710/35182 blk 3
<..>
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn:
1732/4796FCC0, prev 1732/4796FC80, desc: VISIBLE snapshotConflictHorizon: 0,
flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel
1663/16710/35182 blk 25
rmgr: Standby len (rec/tot): 90/ 90, tx: 0, lsn:
1732/4797E240, prev 1732/4797E0F8, desc: INVALIDATIONS ; inval msgs:
catcache 55 catcache 54 relcache 35182
rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939561, lsn:
1732/479C0688, prev 1732/479C0630, desc: INSERT+INIT off: 1, flags: 0x01,
blkref #0: rel 1663/16710/35182 blk 1

[the table structure]

\d+ data.slice_trigger

Partitioned table "data.slice_trigger"
Column | Type | Collation | Nullable
| Default | Storage |
Compression | Stats target | Description
---------------------------+--------------------------+-----------+----------+---------------------------------------------------------+----------+-------------+--------------+-------------
st_id | bigint | | not null
| nextval('data.slice_trigger_st_id_seq'::regclass) | plain |
| |
st_site | text | | not null
| | extended |
| |
st_event_type | text | | not null
| | extended |
| |
st_data_type | text | |
| | extended |
| |
st_storing_state | text | |
| | extended |
| |
st_event_id | bigint | |
| nextval('data.slice_trigger_st_event_id_seq'::regclass) | plain |
| |
st_unit_identifier | text | | not null
| | extended |
| |
st_reservation_identifier | text | |
| | extended |
| |
st_occurred_at | timestamp with time zone | | not null
| clock_timestamp() | plain |
| |
st_event_json | text | |
| | extended |
| |
st_process | text | |
| | extended |
| |
Partition key: HASH (st_site, st_unit_identifier)
Indexes:
"slice_trigger_pkey" PRIMARY KEY, btree (st_id, st_site,
st_unit_identifier)
Partitions: data.slice_trigger_0 FOR VALUES WITH (modulus 32, remainder
0),
data.slice_trigger_1 FOR VALUES WITH (modulus 32, remainder
1),
data.slice_trigger_10 FOR VALUES WITH (modulus 32, remainder
10),
data.slice_trigger_11 FOR VALUES WITH (modulus 32, remainder
11),
data.slice_trigger_12 FOR VALUES WITH (modulus 32, remainder
12),
data.slice_trigger_13 FOR VALUES WITH (modulus 32, remainder
13),
...

there are plenty of "canceling autovacuum task while truncating relation"
log entries for other partitions of this table and other partitioned tables
before and after this happening, which did not cause the same issue.

Browse pgsql-bugs by date

  From Date Subject
Next Message Vladlen Popolitov 2025-02-13 12:46:41 Error in form on site commitfest.postgresql.org
Previous Message PG Bug reporting form 2025-02-13 10:43:54 BUG #18810: invalid value for parameter "synchronized_standby_slots" Caused error:"Segmentation fault"