From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support for N synchronous standby servers - take 2 |
Date: | 2015-06-29 08:01:10 |
Message-ID: | CAB7nPqQdS7wmPVXqJxF7ZgTM0L-mxM0-ohadL7=e0+UjjpsJGw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 29, 2015 at 4:20 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 06/28/2015 04:36 AM, Sawada Masahiko wrote:
>> On Sat, Jun 27, 2015 at 3:53 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> On 06/26/2015 11:32 AM, Robert Haas wrote:
>>>> I think your proposal is worth considering, but you would need to fill
>>>> in a lot more details and explain how it works in detail, rather than
>>>> just via a set of example function calls. The GUC-based syntax
>>>> proposal covers cases like multi-level rules and, now, prioritization,
>>>> and it's not clear how those would be reflected in what you propose.
>>>
>>> So what I'm seeing from the current proposal is:
>>>
>>> 1. we have several defined synchronous sets
>>> 2. each set requires a quorum of k (defined per set)
>>> 3. within each set, replicas are arranged in priority order.
>>>
>>> One thing which the proposal does not implement is *names* for
>>> synchronous sets. I would also suggest that if I lose this battle and
>>> we decide to go with a single stringy GUC, that we at least use JSON
>>> instead of defining our out, proprietary, syntax?
>>
>> JSON would be more flexible for making synchronous set, but it will
>> make us to change how to parse configuration file to enable a value
>> contains newline.
>
> Right. Well, another reason we should be using a system catalog and not
> a single GUC ...
I assume that this takes into account the fact that you will still
need a SIGHUP to reload properly the new node information from those
catalogs and to track if some information has been modified or not.
And the fact that a connection to those catalogs will be needed as
well, something that we don't have now. Another barrier to the catalog
approach is that catalogs get replicated to the standbys, and I think
that we want to avoid that. But perhaps you simply meant having an SQL
interface with some metadata, right? Perhaps I got confused by the
word 'catalog'.
>>> I'm personally not convinced that quorum and prioritization are
>>> compatible. I suggest instead that quorum and prioritization should be
>>> exclusive alternatives, that is that a synch set should be either a
>>> quorum set (with all members as equals) or a prioritization set (if rep1
>>> fails, try rep2). I can imagine use cases for either mode, but not one
>>> which would involve doing both together.
>>>
>>
>> Yep, separating the GUC parameter between prioritization and quorum
>> could be also good idea.
>
> We're agreed, then ...
Er, I disagree here. Being able to get prioritization and quorum
working together is a requirement of this feature in my opinion. Using
again the example above with 2 data centers, being able to define a
prioritization set on the set of nodes of data center 1, and a quorum
set in data center 2 would reduce failure probability by being able to
prevent problems where for example one or more nodes lag behind
(improving performance at the same time).
>> Also I think that we must enable us to decide which server we should
>> promote when the master server is down.
>
> Yes, and probably my biggest issue with this patch is that it makes
> deciding which server to fail over to *more* difficult (by adding more
> synchronous options) without giving the DBA any more tools to decide how
> to fail over. Aside from "because we said we'd eventually do it", what
> real-world problem are we solving with this patch?
Hm. This patch needs to be coupled with improvements to
pg_stat_replication to be able to represent a node tree by basically
adding to which group a node is assigned. I can draft that if needed,
I am just a bit too lazy now...
Honestly, this is not a matter of tooling. Even today if a DBA wants
to change s_s_names without touching postgresql.conf you could just
run ALTER SYSTEM and then reload parameters.
> It's always been a problem that one can accomplish a de-facto
> denial-of-service by joining a cluster using the same application_name
> as the synch standby, moreso because it's far too easy to do that
> accidentally. One needs to simply make the mistake of copying
> recovery.conf from the synch replica instead of the async replica, and
> you've created a reliability problem.
That's a scripting problem then. There are many ways to do a false
manipulation in this area when setting up a standby. application_name
value is one, you can do worse by pointing to an incorrect IP as well,
miss a firewall filter or point to an incorrect port.
> Also, the fact that we use application_name for synch_standby groups
> prevents us from giving the standbys in the group their own names for
> identification purposes. It's only the fact that synchronous groups are
> relatively useless in the current feature set that's prevented this from
> being a real operational problem; if we implement quorum commit, then
> users are going to want to use groups more often and will want to
> identify the members of the group, and not just by IP address.
Managing groups in the synchronous protocol is adding one level of
complexity for the operator, while what I had in mind first was to
allow a user to be able to pass to the server a formula that decides
if synchronous_commit is validated or not. In any case this feels like
a different feature thinking of it now.
> We *really* should have discussed this feature at PGCon.
What is done is done. Sawada-san and I have met last weekend, and we
agreed to get a clear image of a spec for this features on this thread
before doing any coding. So let's continue the discussion..
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2015-06-29 08:37:11 | Re: PANIC in GIN code |
Previous Message | Michael Paquier | 2015-06-29 07:04:20 | Re: pg_rewind failure by file deletion in source server |