Quick Links

Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Date:	2013-01-17 23:48:44
Message-ID:	20130117234844.GC3074@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2013-01-18 08:24:31 +0900, Michael Paquier wrote:
> On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> > I encountered the problem that the timeline switch is not performed
> > expectedly.
> > I set up one master, one standby and one cascade standby. All the servers
> > share the archive directory. restore_command is specified in the
> > recovery.conf
> > in those two standbys.
> >
> > I shut down the master, and then promoted the standby. In this case, the
> > cascade standby should switch to new timeline and replication should be
> > successfully restarted. But the timeline was never changed, and the
> > following
> > log messages were kept outputting.
> >
> > sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> > sby2 LOG: replication terminated by primary server
> > sby2 DETAIL: End of WAL reached on timeline 1
> > sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> > sby2 LOG: replication terminated by primary server
> > sby2 DETAIL: End of WAL reached on timeline 1
> > sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> > sby2 LOG: replication terminated by primary server
> > sby2 DETAIL: End of WAL reached on timeline 1
> >
> I am seeing similar issues with master at 88228e6.
> This is easily reproducible by setting up 2 slaves under a master, then
> kill the master. Promote slave 1 and reconnect slave 2 to slave 1, then
> you will notice that the timeline jump is not done.

Can you reproduce that one with 7fcbf6a^ (i.e before xlogreader got
split off?).

> The replication delays are still here.

That one is caused by this nice bug, courtesy of yours truly:
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 90ba32e..1174493 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8874,7 +8874,7 @@ retry:
/* See if we need to retrieve more data */
if (readFile < 0 ||
(readSource == XLOG_FROM_STREAM &&
- receivedUpto <= targetPagePtr + reqLen))
+ receivedUpto < targetPagePtr + reqLen))
{
if (StandbyMode)
{

I didn't notice because I had a testscript inserting stuff continuously
and it cause at most lagging by one record...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave at 2013-01-17 23:24:31 from Michael Paquier

Responses

Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave at 2013-01-18 00:33:21 from Michael Paquier
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave at 2013-01-18 14:20:35 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2013-01-17 23:55:05	Re: PATCH: optimized DROP of multiple tables within a transaction
Previous Message	Stephen Frost	2013-01-17 23:33:22	Re: could not create directory "...": File exists