From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PANIC during crash recovery of a recently promoted standby |
Date: | 2018-05-24 07:57:07 |
Message-ID: | 20180524075707.GE15445@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:
> Looks like I didn't understand Alvaro's comment when he mentioned it to me
> off-list. But I now see what Michael and Alvaro mean and that indeed seems
> like a problem. I was thinking that the test for (ControlFile->state ==
> DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
> after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
> RestartPoint may finish after we have written end-of-recovery record, but
> before we're in production and thus the minRecoveryPoint may again be set.
Yeah, this has been something I considered as well first, but I was not
confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
was actually a safe thing for timeline switches.
So I have spent a good portion of today testing and playing with it to
be confident enough that this was right, and I have finished with the
attached. The patch adds a new flag to XLogCtl which marks if the
control file has been updated after the end-of-recovery record has been
written, so as minRecoveryPoint does not get updated because of a
restart point running in parallel.
I have also reworked the test case you sent, removing the manuals sleeps
and replacing them with correct wait points. There is also no point to
wait after promotion as pg_ctl promote implies a wait. Another
important thing is that you need to use wal_log_hints = off to see a
crash, which is something that allows_streaming actually enables.
Comments are welcome.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
recovery-panic-michael.patch | text/x-diff | 7.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Maxim Boguk | 2018-05-24 09:38:03 | Re: found xmin from before relfrozenxid on pg_catalog.pg_authid |
Previous Message | Thomas Munro | 2018-05-24 07:15:23 | Re: PG11 jit failing on ppc64el |