From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | hlinnakangas(at)vmware(dot)com |
Cc: | andres(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hot standby doesn't come up on some situation. |
Date: | 2014-03-03 00:27:22 |
Message-ID: | 20140303.092722.78980564.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote in <53108506(dot)2010200(at)vmware(dot)com>
> > Yes, but the same stuation could be made by restarting crashed
> > secondary.
>
> Yeah.
>
> > I have no idea about the scenario on whitch this behavior was regarded
> > as
> > undesirable but anyway I think that the secondry should start
> > accepting
> > client just after crash recovery is completed.
>
> Agreed, this is a bug.
>
> I don't think your patch is the right fix for this though. Setting
> minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
> points to the end of the last read and replayed record. What's wrong
> in this case is lastReplayedEndRecptr. At the beginning of recovery,
> it's initialized to the REDO point, but with a shutdown checkpoint,
> that's not quite right. When starting from a shutdown checkpoint, REDO
> points to the beginning of the shutdown record, but we've already
> effectively replayed it. The next record we replay is the one after
> the checkpoint.
It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.
> To see that, I added some elog(LOG) calls:
>
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was shut down at 2014-02-28 14:06:18 EET
> LOG: ReadCheckpointRecord: 0/16479C98
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> ^CLOG: received fast shutdown request
> LOG: aborting any active transactions
> LOG: autovacuum launcher shutting down
> LOG: shutting down
> LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
> checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
> 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
> DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0
> LOG: database system is shut down
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was shut down at 2014-02-28 14:06:23 EET
> LOG: ReadCheckpointRecord: 0/16479D00
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> Killed
>
> At this point, the last record is the shutdown checkpoint, beginning
> at 16479D00, and the server has been killed (immediate shutdown).
>
> ~/pgsql.93stable$ cp recovery.conf data/recovery.conf
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was interrupted; last known up at 2014-02-28
> 14:06:29 EET
> LOG: entering standby mode
> LOG: ReadCheckpointRecord: 0/16479D00
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: record with zero length at 0/16479D68
> LOG: reached end of WAL in pg_xlog, entering archive recovery
> LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
> FATAL: could not connect to the primary server: could not connect to
> server: Connection refused
> ...
>
> Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
> is set to the *beginning* of the checkpoint record, even though the
> checkpoint record has already been effectively replayed, by the feat
> of starting recovery from it. EndRecPtr correctly points to the end of
> the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
> value, the CheckRecoveryConsistency() call concludes that it's not
> consistent.
I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)
I agree that the fix is appropriate.
> I believe the attached fix is the right way to fix this.
It also worked for me. Thank you.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2014-03-03 00:40:00 | Re: Hot standby doesn't come up on some situation. |
Previous Message | Noah Misch | 2014-03-02 22:38:38 | Re: Securing "make check" (CVE-2014-0067) |