From: | Kuwamura Masaki <kuwamura(at)db(dot)is(dot)i(dot)nagoya-u(dot)ac(dot)jp> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pg_rewind with cascade standby doesn't work well |
Date: | 2023-09-20 02:46:45 |
Message-ID: | CAMyC8qryE7mKyvPvGHCt5GpANAmp8sS_tRbraqXcPBx14viy6g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> IMO a test is needed that makes sure no one is going to break this in
>> the future.
>
> You definitely need more complex test scenarios for that. If you can
> come up with new ways to make the TAP tests of pg_rewind mode modular
> in handling more complicated node setups, that would be a nice
> addition, for example.
I'm sorry for lacking tests. For now, I started off with a simple test
that cause the problem I mentioned. The updated WIP patch 0001 includes
the new test for pg_rewind.
And also, I'm afraid that I'm not sure what kind of tests I have to make
for fix this behavior. Would you mind giving me some advice?
> This patch is at least incorrect in its handling of crash recovery,
> because these two should *never* be set in this case as we want to
> replay up to the end of WAL. For example, see xlog.c or the top of
> xlogrecovery.c about the assumptions behind these variables:
> /* crash recovery should always recover to the end of WAL */
> ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
> ControlFile->minRecoveryPointTLI = 0;
>
> If an end-of-recovery record is replayed during crash recovery, these
> assumptions are plain broken.
That make sense! I really appreciate your knowledgeable review.
> One thing that we could consider is to be more aggressive with
> restartpoints when replaying this record for a standby, see a few
> lines above the lines added by your patch, for example. And we could
> potentially emulate a post-promotion restart point to get a refresh of
> the control file as it should, with the correct code paths involved in
> the updates of minRecoveryPoint when the checkpointer does the job.
I'm not confident but you meant we could make restartpoint
(i.e., call `RequestCheckpoint()`) instead of my old patch?
The patch 0001 also contains my understanding.
I also found a bug (maybe). If we call `CreateRestartPoint()` during
crash-recovery, the assertion fails at ComputeXidHorizon() in procarray.c.
It's inherently orthogonal to the problem I already reported. So you can
reproduce this at HEAD with this procedure.
1. Start primary and standby server
2. Modify checkpoint_timeout to 1h on standby
3. Insert 10^10 records and concurrently run CHECKPOINT every second on
primary
4. Do an immediate stop on both standby and primary at the end of the insert
5. Modify checkpoint_timeout to 30 on standby
6. Remove standby.signal on standby
7. Restart standby (it will start crash-recovery)
8. Assertion failure is raised shortly
I think this is because `TruncateSUBTRANS();` in `CreateRestartPoint()` is
called but `StartupSUBTRANS()` isn't called yet. In `StartupXLOG()`, we
call
`StartupSUBTRANS()` if `(ArchiveRecoveryRequested && EnableHotStandby)`.
However, in `CreateRestartPoint()`, we call `TruncateSUBTRANS()` if
`(EnableHotStandby)`. I guess the difference causes this bug. The latter
possibly be called even crash-recovery while former isn't.
The attached patch 0002 fixes it. I think we could discuss about this bug
in
another thread if needed.
Best regards!
Masaki Kuwamura
Attachment | Content-Type | Size |
---|---|---|
v2-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch | application/octet-stream | 5.4 KB |
v1-0002-Fix-restartpoint-during-crash-recovery.patch | application/octet-stream | 593 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Jungwirth | 2023-09-20 02:50:10 | Re: SQL:2011 application time |
Previous Message | Nathan Bossart | 2023-09-20 02:30:33 | Re: Inefficiency in parallel pg_restore with many tables |