From: | Полина Бунгина <bungina(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | pg_rewind WAL segments deletion pitfall |
Date: | 2022-08-23 15:46:30 |
Message-ID: | CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hello,
It seems for me that there is currently a pitfall in the pg_rewind
implementation.
Imagine the following situation:
There is a cluster consisting of a primary with the following
configuration: wal_level=‘replica’, archive_mode=‘on’ and a replica.
1. The primary that is not fast enough in archiving WAL segments (e.g.
network issues, high CPU/Disk load...)
2. The primary fails
3. The replica is promoted
4. We are not lucky enough, the new and the old primary’s timelines
diverged, we need to run pg_rewind
5. We are even less lucky: the old primary still has some WAL segments
with .ready signal files that were generated before the point of divergence
and were not archived. (e.g. 000000020004D20200000095.done,
000000020004D20200000096.ready, 000000020004D20200000097.ready,
000000020004D20200000098.ready)
6. The promoted primary runs for some time and recycles the old WAL
segments.
7. We revive the old primary and try to rewind it
8. When pg_rewind finished successfully, we see that the WAL segments
with .ready files are removed, because they were already absent on the
promoted replica. We end up in a situation where we completely lose some
WAL segments, even though we had a clear sign that they were not
archived and
more importantly, pg_rewind read these segments while collecting
information about the data blocks.
9. The old primary fails to start because of the missing WAL segments
(more strictly, the records between the last common checkpoint and the
point of divergence) with the following log record: "ERROR: requested WAL
segment 000000020004D20200000096 has already been removed"
In this situation, after pg_rewind:
archived:
000000020004D20200000095
000000020004D20200000099.partial
000000030004D20200000099
the following segments are lost:
000000020004D20200000096
000000020004D20200000097
000000020004D20200000098
Thus, my thoughts are: why can’t pg_rewind be a little bit wiser in terms
of creating filemap for WALs? Can it preserve the WAL segments that contain
those potentially lost records (> the last common checkpoint and < the
point of divergence) on the target? (see the patch attached)
If I am missing something however, please correct me or explain why it is
not possible to implement this straightforward solution.
Thank you,
Polina Bungina
Attachment | Content-Type | Size |
---|---|---|
v1-0001-pg_rewind-wal-deletion.patch | application/octet-stream | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniele Varrazzo | 2022-08-24 00:21:31 | Re: Regression in pipeline mode in libpq 14.5 |
Previous Message | Amit Kapila | 2022-08-23 14:26:41 | Re: Excessive number of replication slots for 12->14 logical replication |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-08-23 15:55:11 | Re: SQL/JSON features for v15 |
Previous Message | Andrew Dunstan | 2022-08-23 15:29:39 | Re: SQL/JSON features for v15 |