Re: Standalone synchronous master

From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 01:01:21
Message-ID: 52CDF4E1.8000604@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/8/14, 6:05 PM, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> >On 01/08/2014 03:27 PM, Tom Lane wrote:
>>> >>What we lack, and should work on, is a way for sync mode to have M larger
>>> >>than one. AFAICS, right now we'll report commit as soon as there's one
>>> >>up-to-date replica, and some high-reliability cases are going to want
>>> >>more.
>> >"Sync N times" is really just a guarantee against data loss as long as
>> >you lose N-1 servers or fewer. And it becomes an even
>> >lower-availability solution if you don't have at least N+1 replicas.
>> >For that reason, I'd like to see some realistic actual user demand
>> >before we take the idea seriously.
> Sure. I wasn't volunteering to implement it, just saying that what
> we've got now is not designed to guarantee data survival across failure
> of more than one server. Changing things around the margins isn't
> going to improve such scenarios very much.
>
> It struck me after re-reading your example scenario that the most
> likely way to figure out what you had left would be to see if some
> additional system (think Nagios monitor, or monitors) had records
> of when the various database servers went down. This might be
> what you were getting at when you said "logging", but the key point
> is it has to be logging done on an external server that could survive
> failure of the database server. postmaster.log ain't gonna do it.

Yeah, and I think that the logging command that was suggested allows for that *if configured correctly*.

Automatic degradation to async is useful for protecting you against all modes of a single failure: Master fails, you've got the replica. Replica fails, you've got the master.

But fit hits the shan as soon as you get a double failure, and that double failure can be very subtle. Josh's case is not subtle: You lost power AND the master died. You KNOW you have two failures.

But what happens if there's a network blip that's not large enough to notice (but large enough to degrade your replication) and the master dies? Now you have no clue if you've lost data.

Compare this to async: if the master goes down (one failure), you have zero clue if you lost data or not. At least with auto-degredation you know you have to have 2 failures to suffer data loss.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-01-09 01:03:27 Re: nested hstore patch
Previous Message Tom Lane 2014-01-09 00:05:58 Re: Standalone synchronous master