Quick Links

Re: Synch failover WAS: Support for N synchronous standby servers - take 2

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject:	Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Date:	2015-07-03 17:44:22
Message-ID:	20150703174422.GN3291@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2015-07-03 10:27:05 -0700, Josh Berkus wrote:
> On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
> > Thanks. So we can choice the next master server using by checking the
> > progress of each server, if hot standby is enabled.
> > And a such procedure is needed even today replication.
> >
> > I think that the #2 problem which is Josh pointed out seems to be solved;
> > 1. I need to ensure that data is replicated to X places.
> > 2. I need to *know* which places data was synchronously replicated
> > to when the master goes down.
> > And we can address #1 problem using quorum commit.
>
> It's not solved. I still have zero ways of knowing if a replica was in
> sync or not at the time the master went down.

What?

You pick the standby that's furthest ahead. And you use a high enough
quorum so that given your tolerance for failures you'll always be able
to reach at least one of the synchronous replicas. Then you promote the
one with the highest LSN. Done.

This is something that gets *easier* by quorum, not harder.

> I forked the subject line because I think that the inability to
> identify synch replicas under failover conditions is a serious problem
> with synch rep *today*, and pretending that it doesn't exist doesn't
> help us even if we don't fix it in 9.6.

That's just not how failovers can sanely work. And again, you *have* the
information you can have on the standbys already. You *know* what/from
when the last replayed xact is.

> Let me give you three cases where our lack of information on the replica
> side about whether it thinks it's in sync or not causes synch rep to
> fail to protect data. The first case is one I've actually seen in
> production, and the other two are hypothetical but entirely plausible.
>
> Case #1: two synchronous replica servers have the application name
> "synchreplica". An admin uses the wrong Chef template, and deploys a
> server which was supposed to be an async replica with the same
> recovery.conf template, and it ends up in the "synchreplica" group as
> well. Due to restarts (pushing out an update release), the new server
> ends up seizing and keeping sync. Then the master dies. Because the new
> server wasn't supposed to be a sync replica in the first place, it is
> not checked; they just fail over to the furthest ahead of the two
> original synch replicas, neither of which was actually in synch.

Nobody can protect you against such configuration errors. We can make it
harder to misconfigure, sure, but it doesn't have anything to do with
the topic at hand.

> Case #2: "2 { local, london, nyc }" setup. At 2am, the links between
> data centers become unreliable, such that the on-call sysadmin disables
> synch rep because commits on the master are intolerably slow. Then, at
> 10am, the links between data centers fail entirely. The day shift, not
> knowing that the night shift disabled sync, fail over to London thinking
> that they can do so with zero data loss.

As I said earlier, you can check against that today by checking the last
replayed timestamp. SELECT pg_last_xact_replay_timestamp();

You don't have to pick the one that used to be a sync replica. You pick
the one with the most data received.

If the day shift doesn't bother to check the standbys now, they'd not
check either if they had some way to check whether a node was the chosen
sync replica.

> Case #3 "1 { london, frankfurt }, 1 { sydney, tokyo }" multi-group
> priority setup. We lose communication with everything but Europe. How
> can we decide whether to wait to get sydney back, or to promote London
> immedately?

You normally don't continue automatically at all in that situation. To
avoid/minimize data loss you want to have a majority election system to
select the new primary. That requires reaching the majority of the
nodes. This isn't something specific to postgres, if you look at any
solution out there, they're also doing it that way.

Statically choosing which of the replicas in a group is the current sync
one is a *bad* idea. You want to ensure that at least node in a group
has received the data, and stop waiting as soon that's the case.

> It's an issue *now* that the only data we have about the state of sync
> rep is on the master, and dies with the master. And it severely limits
> the actual utility of our synch rep. People implement synch rep in the
> first place because the "best effort" of asynch rep isn't good enough
> for them, and yet when it comes to failover we're just telling them
> "give it your best effort".

We don't tell them that, but apparently you do.

This subthread is getting absurd, stopping here.

In response to

Re: Synch failover WAS: Support for N synchronous standby servers - take 2 at 2015-07-03 17:27:05 from Josh Berkus

Responses

Re: Synch failover WAS: Support for N synchronous standby servers - take 2 at 2015-07-04 01:16:58 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Petr Korobeinikov	2015-07-03 18:24:19	Re: psql :: support for \ev viewname and \sv viewname
Previous Message	Josh Berkus	2015-07-03 17:27:05	Re: Synch failover WAS: Support for N synchronous standby servers - take 2