Re: Hot standby doesn't come up on some situation.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnakangas(at)vmware(dot)com
Cc: andres(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot standby doesn't come up on some situation.
Date: 2014-03-03 00:27:22
Message-ID: 20140303.092722.78980564.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote in <53108506(dot)2010200(at)vmware(dot)com>
> > Yes, but the same stuation could be made by restarting crashed
> > secondary.
>
> Yeah.
>
> > I have no idea about the scenario on whitch this behavior was regarded
> > as
> > undesirable but anyway I think that the secondry should start
> > accepting
> > client just after crash recovery is completed.
>
> Agreed, this is a bug.
>
> I don't think your patch is the right fix for this though. Setting
> minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
> points to the end of the last read and replayed record. What's wrong
> in this case is lastReplayedEndRecptr. At the beginning of recovery,
> it's initialized to the REDO point, but with a shutdown checkpoint,
> that's not quite right. When starting from a shutdown checkpoint, REDO
> points to the beginning of the shutdown record, but we've already
> effectively replayed it. The next record we replay is the one after
> the checkpoint.

It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.

> To see that, I added some elog(LOG) calls:
>
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was shut down at 2014-02-28 14:06:18 EET
> LOG: ReadCheckpointRecord: 0/16479C98
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> ^CLOG: received fast shutdown request
> LOG: aborting any active transactions
> LOG: autovacuum launcher shutting down
> LOG: shutting down
> LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
> checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
> 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
> DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0
> LOG: database system is shut down
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was shut down at 2014-02-28 14:06:23 EET
> LOG: ReadCheckpointRecord: 0/16479D00
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> Killed
>
> At this point, the last record is the shutdown checkpoint, beginning
> at 16479D00, and the server has been killed (immediate shutdown).
>
> ~/pgsql.93stable$ cp recovery.conf data/recovery.conf
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was interrupted; last known up at 2014-02-28
> 14:06:29 EET
> LOG: entering standby mode
> LOG: ReadCheckpointRecord: 0/16479D00
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: record with zero length at 0/16479D68
> LOG: reached end of WAL in pg_xlog, entering archive recovery
> LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
> FATAL: could not connect to the primary server: could not connect to
> server: Connection refused
> ...
>
> Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
> is set to the *beginning* of the checkpoint record, even though the
> checkpoint record has already been effectively replayed, by the feat
> of starting recovery from it. EndRecPtr correctly points to the end of
> the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
> value, the CheckRecoveryConsistency() call concludes that it's not
> consistent.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

I agree that the fix is appropriate.

> I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-03-03 00:40:00 Re: Hot standby doesn't come up on some situation.
Previous Message Noah Misch 2014-03-02 22:38:38 Re: Securing "make check" (CVE-2014-0067)