.wal file gets modified after .ready file is created

From: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
To: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: .wal file gets modified after .ready file is created
Date: 2023-03-29 19:42:31
Message-ID: 6376e0ba308a4344a77727ab6d9d126d@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Everyone,

We are seeing that .wal modification time is after .ready file is created.
[xyz/db/pg_wal]$ stat -c %Y archive_status/000000010000027100000018.ready
1678538736
[xyz/db/pg_wal]$ stat -c %Y 000000010000027100000018
1678538737

(postgresql binary does include this commit
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24fd9bbca7a20315586de877d74923)

This only happens when there is shutdown/fast shutdown
(I have attached postgres logs (with some path information changed))
12:42:49 UTC::@:[372]:LOG: starting PostgreSQL 14.5 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red ), 64-bit
12:42:49 UTC::@:[372]:LOG: listening on IPv4 address "0.0.0.0", port 5432
12:42:49 UTC::@:[372]:LOG: listening on IPv6 address "::", port 5432
12:42:49 UTC::@:[372]:LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
12:42:49 UTC::@:[374]:LOG: database system was shut down at 12:33:33 UTC
12:42:49 UTC::@:[372]:LOG: database system is ready to accept connections
12:44:12 UTC::@:[372]:LOG: received SIGHUP, reloading configuration files
12:44:12 UTC::@:[372]:LOG: skipping missing configuration file "changed_path/recovery.conf"
12:44:12 UTC::@:[380]:LOG: skipping missing configuration file "changed_path/recovery.conf"
12:45:37 UTC::@:[372]:LOG: received fast shutdown request
12:45:37 UTC::@:[372]:LOG: aborting any active transactions
12:45:37 UTC::@:[372]:LOG: background worker "logical replication launcher" (PID 381) exited with exit code 1
12:45:37 UTC::@:[375]:LOG: shutting down
12:45:36 UTC::@:[375]:LOG: checkpoint starting: shutdown immediate
12:45:36 UTC::@:[379]:LOG: archive command failed with exit code 1
12:45:36 UTC::@:[379]:DETAIL: The failed archive command was: changed_path/rds_wal_archive pg_wal/0027100000018
12:45:36 UTC::@:[375]:LOG: checkpoint complete: wrote 7 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.=0.011 s, total=0.035 s; sync files=6, longest=0.007 s, average=0.002 s; distance=65536 kB, estimate=65536 kB
12:45:36 UTC::@:[379]:LOG: archive command failed with exit code 1
12:45:36 UTC::@:[379]:DETAIL: The failed archive command was: changed_path/rds_wal_archive pg_wal/0027100000018
12:45:37 UTC::@:[379]:LOG: archive command failed with exit code 1
12:45:37 UTC::@:[379]:DETAIL: The failed archive command was: changed_path/rds_wal_archive pg_wal/0027100000018
12:45:37 UTC::@:[379]:WARNING: archiving write-ahead log file "000000010000027100000018" failed too many times, will try
12:45:40 UTC::@:[372]:LOG: database system is shut down
12:45:44 UTC::@:[372]:LOG: listening on IPv4 address "0.0.0.0", port 5432
12:45:44 UTC::@:[372]:LOG: listening on IPv6 address "::", port 5432
12:45:44 UTC::@:[372]:LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
12:45:44 UTC::@:[374]:LOG: database system was shut down at 12:45:36 UTC
12:45:44 UTC::@:[372]:LOG: database system is ready to accept connections
12:45:44 UTC::@:[379]:LOG: archive command failed with exit code 1
12:45:44 UTC::@:[379]:DETAIL: The failed archive command was: changed_path/rds_wal_archive pg_wal/000000010000027100000018
12:45:45 UTC::@:[379]:LOG: archive command failed with exit code 1
12:45:45 UTC::@:[379]:DETAIL: The failed archive command was: changed_path/rds_wal_archive pg_wal/000000010000027100000018
12:45:46 UTC::@:[379]:LOG: archive command failed with exit code 1

pg_waldump of failing and next wal log

failing wal_log
pg_waldump 000000010000027100000018 | tail
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 271/60000B00, prev 271/60000AA8, desc: RUNNING_XACTS nextXid 80725 latestCompletedXid 80724 oldestRunningXid 80725
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 271/60000B38, prev 271/60000B00, desc: SWITCH

next wal_log
pg_waldump 000000010000027100000019 | head
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 271/64000028, prev 271/60000B38, desc: CHECKPOINT_SHUTDOWN redo 271/64000028; tli 1; prev tli 1; fpw true; xid 0:80725; oid 172060; multi 1; offset 0; oldest xid 549 in DB 16406; oldest multi 1 in DB 16406; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 271/640000A0, prev 271/64000028, desc: RUNNING_XACTS nextXid 80725 latestCompletedXid 80724 oldestRunningXid 80725
rmgr: Heap len (rec/tot): 67/ 1352, tx: 80726, lsn: 271/640000D8, prev 271/640000A0, desc: HOT_UPDATE off 28 xmax 80726 flags 0x00 ; new off 32 xmax 0, blkref #0: rel 1664/0/1260 blk 0 FPW

wal_dump of failing and next wal log looks normal to me.
I dont think there is any contrecord involved because if it was we would have seen
a corrosponding log entry. So maybe we are hitting a corner case of commit [1] ?.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24fd9bbca7a20315586de877d74923

Regards
Sachin
Amazon Web Services

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-03-29 21:10:07 Re: NEED RPM FILE OF LATEST POSTGRE supported for AIX 7.2
Previous Message Dave Bothwell 2023-03-29 16:55:19 Re: CTE subquery referencing phantom records