Re: Synch failover WAS: Support for N synchronous standby servers - take 2

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Date: 2015-07-03 09:23:20
Message-ID: CAHGQGwGzv7BHUSYO692ifxXxYrzEkaamO6DfXSBieEGtro_QYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> On 07/02/2015 12:44 PM, Andres Freund wrote:
>>>> On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
>>>>> So there's two parts to this:
>>>>>
>>>>> 1. I need to ensure that data is replicated to X places.
>>>>>
>>>>> 2. I need to *know* which places data was synchronously replicated to
>>>>> when the master goes down.
>>>>>
>>>>> My entire point is that (1) alone is useless unless you also have (2).
>>>>
>>>> I think there's a good set of usecases where that's really not the case.
>>>
>>> Please share! My plea for usecases was sincere. I can't think of any.
>>>
>>>>> And do note that I'm talking about information on the replica, not on
>>>>> the master, since in any failure situation we don't have the old
>>>>> master around to check.
>>>>
>>>> How would you, even theoretically, synchronize that knowledge to all the
>>>> replicas? Even when they're temporarily disconnected?
>>>
>>> You can't, which is why what we need to know is when the replica thinks
>>> it was last synced from the replica side. That is, a sync timestamp and
>>> lsn from the last time the replica ack'd a sync commit back to the
>>> master successfully. Based on that information, I can make an informed
>>> decision, even if I'm down to one replica.
>>>
>>>>> ... because we would know definitively which servers were in sync. So
>>>>> maybe that's the use case we should be supporting?
>>>>
>>>> If you want automated failover you need a leader election amongst the
>>>> surviving nodes. The replay position is all they need to elect the node
>>>> that's furthest ahead, and that information exists today.
>>>
>>> I can do that already. If quorum synch commit doesn't help us minimize
>>> data loss any better than async replication or the current 1-redundant,
>>> why would we want it? If it does help us minimize data loss, how?
>>
>> In your example of "2" : { "local_replica", "london_server", "nyc_server" },
>> if there is not something like quorum commit, only local_replica is synch
>> and the other two are async. In this case, if the local data center gets
>> destroyed, you need to promote either london_server or nyc_server. But
>> since they are async, they might not have the data which have been already
>> committed in the master. So data loss! Of course, as I said yesterday,
>> they might have all the data and no data loss happens at the promotion.
>> But the point is that there is no guarantee that no data loss happens.
>> OTOH, if we use quorum commit, we can guarantee that either london_server
>> or nyc_server has all the data which have been committed in the master.
>>
>> So I think that quorum commit is helpful for minimizing the data loss.
>>
>
> Yeah, quorum commit is helpful for minimizing data loss in comparison
> with today replication.
> But in this your case, how can we know which server we should use as
> the next master server, after local data center got down?
> If we choose a wrong one, we would get the data loss.

Check the progress of each server, e.g., by using
pg_last_xlog_replay_location(),
and choose the server which is ahead of as new master.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Syed, Rahila 2015-07-03 09:45:44 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Sawada Masahiko 2015-07-03 08:59:03 Re: Synch failover WAS: Support for N synchronous standby servers - take 2