>
>05.03.2021 15:55, David Steele:
>> On 3/4/21 11:32 PM, Michael Paquier wrote:
>>> On Wed, Mar 03, 2021 at 06:49:02AM -0500, David Steele wrote:
>>>> OK, but shouldn't we have a full page write for this page after the
>>>> backup
>>>> starts, rather than the partial we seem to be seeing here? Is there any
>>>> condition where the full page write would be skipped legitimately,
>>>> or does
>>>> it point to a problem?
>>>
>>> That's how things work. If they don't work this way for physical
>>> backups, we may have a problem. At replay, the full page will be
>>> replayed if BKPIMAGE_APPLY is correctly set, as per
>>> XLogReadBufferForRedoExtended(). And XLogRecordAssemble()
>>> does the decision when building the record (just grep for
>>> needs_backup).
>>
>> That's exactly the problem. This WAL record appears to be the only
>> reference to the page the submitter could find in their WAL. Anton,
>> could you confirm that?
>>
>> If we can confirm that there is no FPI for this page after the initial
>> backup checkpoint, wouldn't that point to an issue?
>>
>>>> If Postgres is running correctly there is certainly no expectation for
>>>> support of this unusual use case, but I do think that this possibly
>>>> points
>>>> to an issue in Postgres, which under normal circumstances would be
>>>> very hard
>>>> to detect.
>>>
>>> Well, the report tells that this is an issue that happens on those
>>> fake files full of zeros, but are you sure that you have the sizing
>>> right? I still don't see any evidence of anything broken based on the
>>> information gathered for full backups, FWIW.
>>
>> This is the size we validate for normal restores so pretty sure it is
>> accurate. Actually, postgres doesn't seem to care if the file is large
>> enough as long as it is present. The reason we precreate the files at
>> the correct size is because when postgres extends the file it is not
>> sparse and uses more space.
>>
>> Regards,
I have reproduced the situation with WAL files and zeroed pgbackrest files.
the same problem started on the first lsn in wal.
I attach the logs recovery process
2021-03-06 00:36:13.784 MSK [95] LOG: database system was interrupted; last known up at 2021-03-01 01:07:51 MSK
2021-03-06 00:36:13.871 MSK [95] LOG: starting point-in-time recovery to 2021-03-01 12:00:00+03
2021-03-06 00:36:13.915 MSK [95] LOG: restored log file "0000000B.history" from archive
2021-03-06 00:36:14.158 MSK [95] LOG: restored log file "0000000B000002710000008A" from archive
2021-03-06 00:36:14.172 MSK [95] LOG: redo starts at 271/8A000028
2021-03-06 00:36: 14.325 MSK [95] WARNING: page 27 of relation base/17946/28008 is uninitialized
2021-03-06 00:36:14.325 MSK [95] WARNING: page 15 of relation base/17946/2840 is uninitialized
2021-03-06 00:36:14.325 MSK [95] PANIC: WAL contains references to invalid pages
2021-03-06 00:36:14.681 MSK [92] LOG: startup process (PID 95) was terminated by signal 6: Aborted
2021-03-06 00:36:14.681 MSK [92] LOG: terminating any other active server processes
2021-03-06 00:36:14.708 MSK [92] LOG: database system is shut down
2021-03-06 00:36:16.175 MSK [9] LOG: database system was interrupted while in recovery at log time 2021-03-01 01:00:11 MSK
2021-03-06 00:36:16.175 MSK [9] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2021-03-06 00:36:16.263 MSK [9] LOG: starting point-in-time recovery to 2021-03-01 12:00:00+03
2021-03-06 00:36:16.306 MSK [9] LOG: restored log file "0000000B.history" from archive
and if i watch in 0000000B000002710000008A. i see:
bash-4.2$ /usr/pgsql-11/bin/pg_waldump /var/lib/pgsql/11/data/pg_wal/0000000B000002710000008A -e 271/8A009E10
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 271/8A000028, prev 271/89A60D10, desc: RUNNING_XACTS nextXid 361855004 latestCompletedXid 361855003 oldestRunningXid 361855004
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 271/8A000060, prev 271/8A000028, desc: CHECKPOINT_ONLINE redo 271/8A000028; tli 11; prev tli 11; fpw true; xid 0:361855004; oid 29809888; multi 240501; offset 500196; oldest xid 162916086 in DB 17950; oldest multi 1 in DB 17948; oldest/newest commit timestamp xid: 0/0; oldest running xid 361855004; online
rmgr: Heap len (rec/tot): 65/ 2685, tx: 361855004, lsn: 271/8A0000D0, prev 271/8A000060, desc: HOT_UPDATE off 4 xmax 361855004 ; new off 10 xmax 0, blkref #0: rel 1663/17948/24604 blk 4 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 361855004, lsn: 271/8A000B50, prev 271/8A0000D0, desc: COMMIT 2021-03-01 01:00:12.752834 MSK
rmgr: Heap len (rec/tot): 65/ 1941, tx: 361855005, lsn: 271/8A000B78, prev 271/8A000B50, desc: HOT_UPDATE off 3 xmax 361855005 ; new off 5 xmax 0, blkref #0: rel 1663/17948/24604 blk 21 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 361855005, lsn: 271/8A001310, prev 271/8A000B78, desc: COMMIT 2021-03-01 01:00:13.772687 MSK
rmgr: Heap len (rec/tot): 65/ 3745, tx: 361855006, lsn: 271/8A001338, prev 271/8A001310, desc: HOT_UPDATE off 2 xmax 361855006 ; new off 9 xmax 0, blkref #0: rel 1663/17948/24604 blk 1 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 361855006, lsn: 271/8A0021F8, prev 271/8A001338, desc: COMMIT 2021-03-01 01:00:14.865075 MSK
rmgr: Heap len (rec/tot): 73/ 73, tx: 361855007, lsn: 271/8A002220, prev 271/8A0021F8, desc: HOT_UPDATE off 5 xmax 361855007 ; new off 11 xmax 0, blkref #0: rel 1663/17948/24604 blk 4
rmgr: Transaction len (rec/tot): 34/ 34, tx: 361855007, lsn: 271/8A002270, prev 271/8A002220, desc: COMMIT 2021-03-01 01:00:15.887661 MSK
rmgr: Heap len (rec/tot): 59/ 1855, tx: 361855008, lsn: 271/8A002298, prev 271/8A002270, desc: LOCK off 27: xid 361855008: flags 0 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/17946/27953 blk 0 FPW
rmgr: Heap len (rec/tot): 69/ 69, tx: 361855008, lsn: 271/8A0029D8, prev 271/8A002298, desc: HOT_UPDATE off 27 xmax 361855008 KEYS_UPDATED ; new off 28 xmax 361855008, blkref #0: rel 1663/17946/27953 blk 0
rmgr: Heap len (rec/tot): 54/ 8134, tx: 361855008, lsn: 271/8A002A20, prev 271/8A0029D8, desc: INSERT off 115, blkref #0: rel 1663/17946/27924 blk 2602 FPW
rmgr: Btree len (rec/tot): 53/ 3353, tx: 361855008, lsn: 271/8A004A00, prev 271/8A002A20, desc: INSERT_LEAF off 163, blkref #0: rel 1663/17946/27930 blk 11930 FPW
rmgr: Heap len (rec/tot): 54/ 5906, tx: 361855008, lsn: 271/8A005720, prev 271/8A004A00, desc: INSERT off 102, blkref #0: rel 1663/17946/28008 blk 8 FPW
rmgr: Btree len (rec/tot): 53/ 4193, tx: 361855008, lsn: 271/8A006E50, prev 271/8A005720, desc: INSERT_LEAF off 205, blkref #0: rel 1663/17946/28011 blk 335 FPW
rmgr: Btree len (rec/tot): 53/ 6873, tx: 361855008, lsn: 271/8A007EB8, prev 271/8A006E50, desc: INSERT_LEAF off 4, blkref #0: rel 1663/17946/28013 blk 2 FPW
rmgr: Heap len (rec/tot): 59/ 1115, tx: 361855008, lsn: 271/8A0099B0, prev 271/8A007EB8, desc: LOCK off 9: xid 361855008: flags 0 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/17946/27484 blk 2 FPW
for a complete picture, I attach information about the cluster
bash-4.2$ /usr/pgsql-11/bin/pg_controldata /var/lib/pgsql/11/data/
pg_control version number: 1100
Catalog version number: 201809051
Database system identifier: 6725332219754936319
Database cluster state: in archive recovery
pg_control last modified: Sat Mar 6 00:36:16 2021
Latest checkpoint location: 271/8A000060
Latest checkpoint's REDO location: 271/8A000028
Latest checkpoint's REDO WAL file: 0000000B000002710000008A
Latest checkpoint's TimeLineID: 11
Latest checkpoint's PrevTimeLineID: 11
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:361855004
Latest checkpoint's NextOID: 29809888
Latest checkpoint's NextMultiXactId: 240501
Latest checkpoint's NextMultiOffset: 500196
Latest checkpoint's oldestXID: 162916086
Latest checkpoint's oldestXID's DB: 17950
Latest checkpoint's oldestActiveXID: 361855004
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 17948
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Mon Mar 1 01:00:11 2021
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 271/8AFB4028
Min recovery ending loc's timeline: 11
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 1000
max_worker_processes setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 66bd29a825f4f3e8aea908b50f798ba53215e9f074129724e5a0936a51442e4f
--
Vladimir skilyazhnev