From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | julian(dot)markwort(at)cybertec(dot)at |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [BUG] recovery of prepared transactions during promotion can fail |
Date: | 2023-06-19 05:25:14 |
Message-ID: | 20230619.142514.373396981484508204.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the report, reproducer and the patches.
At Fri, 16 Jun 2023 16:27:40 +0200, Julian Markwort <julian(dot)markwort(at)cybertec(dot)at> wrote in
> - prepare a transaction
> - crash postgresql
> - create standby.signal file
> - start postgresql, wait for recovery to finish
> - promote
..
> The promotion will fail with a FATAL error, stating that "requested WAL segment .* has already been removed".
> The FATAL error causes the startup process to exit, so postmaster shuts down again.
>
> Here's an exemplary log output, maybe this helps people to find this issue when they search for it online:
> LOG: redo done at 0/15D89B8
> LOG: last completed transaction was at log time 2023-06-16 13:09:53.71118+02
> LOG: selected new timeline ID: 2
> LOG: archive recovery complete
> FATAL: requested WAL segment pg_wal/000000010000000000000001 has already been removed
> LOG: startup process (PID 1650358) exited with exit code 1
Reproduced here.
> The cause of this failure is an oversight (rather obvious in hindsight):
> The renaming of the WAL file (that was last written to before the crash happened) to .partial is done *before* PostgreSQL
> might have to read this very file to recover prepared transactions from it.
> The relevant function calls here are durable_rename() and RecoverPreparedTransactions() in xlog.c .
> This behaviour has - apparently unintentionally - been fixed in PG 15 and upwards (see commit 811051c ), as part of a
> general restructure and reorganization of this portion of xlog.c (see commit 6df1543 ).
I think so, the reordering might have done for some other reasons, though.
> Furthermore, it seems this behaviour does not appear in PG 12 and older, due to another possible bug:
<snip>...
> In PG 13 and newer, the XLogReaderState is reset in XLogBeginRead()
> before reading WAL in XlogReadTwoPhaseData() in twophase.c .
I arraived at the same conclusion.
> In the older releases (PG <= 12), this reset is not done, so the requested LSN containing the prepared transaction can
> (by happy coincidence) be read from in-memory buffers, and PostgreSQL consequently manages to come up just fine (as the
> WAL has already been read into buffers prior to the .partial rename).
> If the older releases also where to properly reset the XLogReaderState, they would also fail to find the LSN on disk, and
> hence PostgreSQL would crash again.
From the perspective of loading WAL for prepared transactions, the
current code in those versions seems fine. Although I suspect Windows
may not like to rename currently-open segments, it's likely acceptable
as the current test set operates without issue.. (I didn't tested this.)
> I've attached patches for PG 14 and PG 13 that mimic the change in PG15 (commit 811051c ) and reorder the crucial events,
> placing the recovery of prepared transactions *before* renaming the file.
It appears to move the correct part of the code to the proper
location, modifying the steps to align with later versions.
> I've also attached recovery test scripts for PG >= 12 and PG <= 11 that can be used to verify that promote after recovery
> with prepared transactions works.
It effectively detects the bug, though it can't be directly used in
the tree as-is. I'm unsure whether we need this in the tree, though.
> My humble opinion is that this fix should be backpatched to PG 14 and PG 13.
I agree with you.
> It's debatable whether the fix needs to be brought back to 12 and older also, as those do not exhibit this issue, but the
> order of renaming is still wrong.
> I'm not sure if there could be cases where the in-memory buffers of the walreader are too small to cover a whole WAL
> file.
> There could also be other issues from operations that require reading WAL that happen after the .partial rename, I
> haven't checked in depth what else happens in the affected codepath.
> Please let me know if you think this should also be fixed in PG 12 and earlier, so I can produce the patches for those
> versions as well.
There's no immediate need to change the versions. However, I would
prefer to backpatch them to the older versions for the following
reasons.
1. Applying this eases future backpatching in this area, if any.
2. I have reservations about renaming possibly-open WAL segments.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-06-19 05:37:48 | Re: Allow pg_archivecleanup to remove backup history files |
Previous Message | Michael Paquier | 2023-06-19 05:24:44 | Re: [BUG] recovery of prepared transactions during promotion can fail |