From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | andres(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hot standby doesn't come up on some situation. |
Date: | 2014-03-05 12:51:36 |
Message-ID: | 53171DD8.9000006@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:
> Hello,
>
> After all, I have confirmed that this fixes the problem on crash
> recovery of hot-standby botfor 9.3 and HEAD and no problem was
> found except unreadability :(
Ok, committed. Thanks!
Any concrete suggestions about the readability? Is there some particular
spot that needs clarifying?
> By the way, I moderately want to fix an assertion message to a
> ordinary one. Details are below.
>
> ====
> The server stops with following message during restarting after
> crash requesting archive recovery when the WAL has been produced
> with the wal_level below WAL_LEVEL_HOT_STANDBY.
>
> | TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799)
> | LOG: startup process (PID 7270) was terminated by signal 6: Aborted
>
> Surely this is the consequence of illegal operation but I think
> it is also not a issue of assertion - which fires on something
> wrong in design or quite rare cases(this case ?).
Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion,
because the oldestActiveXID is set in the checkpoint record even though
wal_level is 'archive', or if you simply have assertions disabled, the
system will start up in hot standby mode even though it's not safe.
> So it might be
> better to show message as below on the case.
>
> | FATAL: Checkpoint doesn't have valid oldest active transaction id
> | HINT: Reading WAL might have been written under insufficient wal_level.
>
> This could do in this way,
> ======
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index e3d5e10..bb6922a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6789,7 +6789,13 @@ StartupXLOG(void)
> if (wasShutdown)
> oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> else
> + {
> oldestActiveXID = checkPoint.oldestActiveXid;
> + if (!TransactionIdIsValid(oldestActiveXID))
> + ereport(FATAL,
> + (errmsg("Checkpoint doesn't have valid oldest active transaction id"),
> + errhint("Reading WAL might have been written under insufficient wal_level.")));
> + }
> Assert(TransactionIdIsValid(oldestActiveXID));
>
> /* Tell procarray about the range of xids it has to deal with */
> =====
>
>
> What do you think about this? Feel free dumping this if you feel
> negative on this.
Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
wal_level is 'archive'. So the above patch doesn't fix the whole problem.
The real bug here is that CheckRequiredParameterValues() tests for
InArchiveRecovery, when it should be testing for
ArchiveRecoveryRequested. Otherwise, the checks are not performed when
going through the crash recovery followed by archive recovery. I
should've changed that as part of the commit that added the crash
recovery then archive recovery behavior.
Fixed, thanks for pointing it out!
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2014-03-05 14:11:41 | Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode |
Previous Message | Michael Paquier | 2014-03-05 12:49:30 | Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode |