From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support for N synchronous standby servers - take 2 |
Date: | 2015-06-28 19:20:05 |
Message-ID: | 559048E5.6010406@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/28/2015 04:36 AM, Sawada Masahiko wrote:
> On Sat, Jun 27, 2015 at 3:53 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> On 06/26/2015 11:32 AM, Robert Haas wrote:
>>> I think your proposal is worth considering, but you would need to fill
>>> in a lot more details and explain how it works in detail, rather than
>>> just via a set of example function calls. The GUC-based syntax
>>> proposal covers cases like multi-level rules and, now, prioritization,
>>> and it's not clear how those would be reflected in what you propose.
>>
>> So what I'm seeing from the current proposal is:
>>
>> 1. we have several defined synchronous sets
>> 2. each set requires a quorum of k (defined per set)
>> 3. within each set, replicas are arranged in priority order.
>>
>> One thing which the proposal does not implement is *names* for
>> synchronous sets. I would also suggest that if I lose this battle and
>> we decide to go with a single stringy GUC, that we at least use JSON
>> instead of defining our out, proprietary, syntax?
>
> JSON would be more flexible for making synchronous set, but it will
> make us to change how to parse configuration file to enable a value
> contains newline.
Right. Well, another reason we should be using a system catalog and not
a single GUC ...
> In this case, If we want to use quorum commit (i.g., all replica have
> same priority),
> I guess that we must get ack from 2 *elements* in listed (both group1
> and group2).
> If quorumm = 1, we must get ack from either group1 or group2.
In that case, then priority among quorum groups is pretty meaningless,
isn't it?
>> I'm personally not convinced that quorum and prioritization are
>> compatible. I suggest instead that quorum and prioritization should be
>> exclusive alternatives, that is that a synch set should be either a
>> quorum set (with all members as equals) or a prioritization set (if rep1
>> fails, try rep2). I can imagine use cases for either mode, but not one
>> which would involve doing both together.
>>
>
> Yep, separating the GUC parameter between prioritization and quorum
> could be also good idea.
We're agreed, then ...
> Also I think that we must enable us to decide which server we should
> promote when the master server is down.
Yes, and probably my biggest issue with this patch is that it makes
deciding which server to fail over to *more* difficult (by adding more
synchronous options) without giving the DBA any more tools to decide how
to fail over. Aside from "because we said we'd eventually do it", what
real-world problem are we solving with this patch?
I'm serious. Only if we define the real reliability/availability
problem we want to solve can we decide if the new feature solves it.
I've seen a lot of technical discussion about the syntax for the
proposed GUC, and zilch about what's going to happen when the master
fails, or who the target audience for this feature is.
On 06/28/2015 05:11 AM, Michael Paquier wrote:> On Sat, Jun 27, 2015 at
2:12 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Finally, while I'm raining on everyone's parade: the mechanism of
>> identifying synchronous replicas by setting the application_name on the
>> replica is confusing and error-prone; if we're building out synchronous
>> replication into a sophisticated system, we ought to think about
>> replacing it.
>
> I assume that you do not refer to a new parameter in the connection
> string like node_name, no? Are you referring to an extension of
> START_REPLICATION in the replication protocol to pass an ID?
Well, if I had my druthers, we'd have a way to map client_addr (or
replica IDs, which would be better, in case of network proxying) *on the
master* to synchronous standby roles. Synch roles should be defined on
the master, not on the replica, because it's the master which is going
to stop accepting writes if they've been defined incorrectly.
It's always been a problem that one can accomplish a de-facto
denial-of-service by joining a cluster using the same application_name
as the synch standby, moreso because it's far too easy to do that
accidentally. One needs to simply make the mistake of copying
recovery.conf from the synch replica instead of the async replica, and
you've created a reliability problem.
Also, the fact that we use application_name for synch_standby groups
prevents us from giving the standbys in the group their own names for
identification purposes. It's only the fact that synchronous groups are
relatively useless in the current feature set that's prevented this from
being a real operational problem; if we implement quorum commit, then
users are going to want to use groups more often and will want to
identify the members of the group, and not just by IP address.
We *really* should have discussed this feature at PGCon.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2015-06-28 19:29:37 | Re: Solaris testers wanted for strxfrm() behavior |
Previous Message | Heikki Linnakangas | 2015-06-28 18:46:47 | Re: pg_rewind failure by file deletion in source server |