From: | 와따가따 <lght2000(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Is it possible for a WAL file to be missing records? |
Date: | 2025-03-09 07:55:48 |
Message-ID: | CAAEzU5vAfo0gD2u=TWAjFoeQLjHdTTaypM85mx+rQKHE8ht1OA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
PostgreSQL version and HA extension in use
- PostgreSQL 13.10 version
- pg_auto_failover 2.0
CPU usage and load were increasing due to high load.
Failover was performed while a large number of WALwrite events occurred in
the primary DB.
I confirmed that the part where the secondary was not promoted was a
pg_auto_failover issue.
I promoted the secondary manually.
And I originally tried to make the primary DB a new secondary using the
archived wal file, but there seemed to be a missing WAL record.
So, I opened the WAL file using pg_waldump and there was a missing record.
It was not a DB server crash.
Can records not be recorded in the WAL file even when a failover is
performed due to high load?
I'm wondering if this could be considered a bug or if it was a situation
where WAL records could be lost.
I will send you the information confirmed through DB log and pg_waldump.
I'll share some DB settings too.
hot_standby_feedback = on
hot_standby = on
synchronous_commit = on
wal_writer_flush_after = 1MB
wal_sync_method = fdatasync
wal_writer_delay = 200ms
wal_buffers = 16MB
wal_segment_size= 16MB
*[When the first failover occurs]*
*- WAL apply DB log*
[image: image.png]
*- Check the wal record using pg_waldump*
I verified that there are no missing lsn in 0000000300005015000000A6 and
0000000300005015000000A7.
However, the prev lsn shown in 0000000300005015000000A8 is not found in
0000000300005015000000A7.
- The last LSN of 0000000300005015000000A7 is 5015/A6003778
-The prev LSN of the first record of 0000000300005015000000A8 is
5015/A7FFED78.
[image: image.png]
*[When the second failover occurs]*
*- DB log*
[image: image.png]
- Check the wal record using pg_waldump
The last LSN of 000000030000501E0000008E is 501E/8EFFCED8.
The prev lsn of the first record in 000000030000501E0000008F wal file is
501E/8EFFEEC8.
It appears to have been lost due to the large difference in LSN.
[image: image.png]
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-03-09 17:14:53 | Re: Error from array_agg when table has many rows |
Previous Message | David Rowley | 2025-03-09 07:22:30 | Re: Error from array_agg when table has many rows |