From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Subject: | Re: Support for N synchronous standby servers - take 2 |
Date: | 2015-07-01 18:21:47 |
Message-ID: | 55942FBB.4000502@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
All:
Replying to multiple people below.
On 07/01/2015 07:15 AM, Fujii Masao wrote:
> On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> You're confusing two separate things. The primary manageability problem
>> has nothing to do with altering the parameter. The main problem is: if
>> there is more than one synch candidate, how do we determine *after the
>> master dies* which candidate replica was in synch at the time of
>> failure? Currently there is no way to do that. This proposal plans to,
>> effectively, add more synch candidate configurations without addressing
>> that core design failure *at all*. That's why I say that this patch
>> decreases overall reliability of the system instead of increasing it.
>
> I agree this is a problem even today, but it's basically independent from
> the proposed feature *itself*. So I think that it's better to discuss and
> work on the problem separately. If so, we might be able to provide
> good way to find new master even if the proposed feature finally fails
> to be adopted.
I agree that they're separate features. My argument is that the quorum
synch feature isn't materially useful if we don't create some feature to
identify which server(s) were in synch at the time the master died.
The main reason I'm arguing on this thread is that discussion of this
feature went straight into GUC syntax, without ever discussing:
* what use cases are we serving?
* what features do those use cases need?
I'm saying that we need to have that discussion first before we go into
syntax. We gave up on quorum commit in 9.1 partly because nobody was
convinced that it was actually useful; that case still needs to be
established, and if we can determine *under what circumstances* it's
useful, then we can know if the proposed feature we have is what we want
or not.
Myself, I have two use case for changes to sync rep:
1. the ability to specify a group of three replicas in the same data
center, and have commit succeed if it succeeds on two of them. The
purpose of this is to avoid data loss even if we lose the master and one
replica.
2. the ability to specify that synch needs to succeed on two replicas in
two different data centers. The idea here is to be able to ensure
consistency between all data centers.
Speaking of which: how does the proposed patch roll back the commit on
one replica if it fails to get quorum?
On 07/01/2015 07:55 AM, Peter Eisentraut wrote:> I respect that some
people might like this, but I don't really see this
> as an improvement. It's much easier for an administration person or
> program to type out a list of standbys in a text file than having to go
> through these interfaces that are non-idempotent, verbose, and only
> available when the database server is up. The nice thing about a plain
> and simple system is that you can build a complicated system on top of
> it, if desired.
I'm disagreeing that the proposed system is "plain and simple". What we
have now is simple; anything we try to add on top of it is goign to be
much less so. Frankly, given the proposed feature, I'm not sure that a
"plain and simple" implementation is *possible*; it's not a simple problem.
On 07/01/2015 07:58 AM, Sawada Masahiko wrote:> On Tue, Jun 30, 2015 at
> We can have same application name servers today, it's like group.
> So there are two problems regarding fail-over:
> 1. How can we know which group(set) we should use? (group means
> application_name here)
> 2. And how can we decide which a server of that group we should
> promote to the next master server?
Well, one possibility is to have each replica keep a flag which
indicates whether it thinks it's in sync or not. This flag would be
updated every time the replica sends a sync-ack to the master. There's a
couple issues with that though:
Synch Flag: the flag would need to be WAL-logged or written to disk
somehow on the replica, in case of the situation where the whole data
center shuts down, comes back up, and the master fails on restart. In
order for the replica to WAL-log this, we'd need to add special .sync
files to pg_xlog, like we currently have .history. Such a file could be
getting updated thousands of times per second, which is potentially an
issue. We could reduce writes by either synching to disk periodically,
or having the master write the sync state to a catalog, and replicate
it, but ...
Race Condition: there's a bit of a race condition during adverse
shutdown situations which could result in uncertainty, especially in
general data center failures and network failures which might not hit
all servers at the same time. If the master is wal-logging sync state,
this race condition is much worse, because it's pretty much certain that
one message updating sync state would be lost in the event of a master
crash. Likewise, if we don't log every synch state change, we've
widened the opportunity for a race condition.
> #1, it's one of the big problem, I think.
> I haven't came up with correct solution yet, but we would need to know
> which server(group) is the best for promoting
> without the running old master server.
> For example, improving pg_stat_replication view. or the mediation
> process always check each progress of standby.
Well, pg_stat_replication is useless for promotion, because if you need
to do an emergency promotion, you don't have access to that view.
Mind you, any adding additional synch configurations will require either
extra columns in pg_stat_replication, or a new system view, but that
doesn't help us for the failover issue.
> #2, I guess the best solution is that the DBA can promote any server
of group.
> That is, DBA always can promote server without considering state of
> server of that group.
> It's not difficult, always using lowest LSN of a group as group LSN.
Sure, but if we're going to do that, why use synch rep at all? Let
alone quorum commit?
> Sounds convenience and flexibility. I agree with this json format
> parameter only if we don't combine both quorum and prioritization.
> Because of backward compatibility.
> I tend to use json format value and it's new separated GUC parameter.
Well, we could just detect if the parameter begins with { or not. ;-)
We could also do an end-run around the current GUC code by not
permitting line breaks in the JSON.
>> Question: what happens *today* if we have two different synch rep
>> strings in two different *.conf files? I wouldn't assume that anyone
>> has tested this ...
>
> We use last defied parameter even if sync rep strings in several file,
right?
Yeah, I was just wondering if anyone had tested that.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-07-01 19:19:16 | 9.6 commitfest schedule |
Previous Message | Peter Geoghegan | 2015-07-01 18:19:47 | Re: Refactoring speculative insertion with unique indexes a little |