Re: Support for N synchronous standby servers - take 2

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-06-26 05:46:15
Message-ID: CAB7nPqQ5dOBFqUL8OfKzjJA7JGf_gqZsO+c8YwWYeZPcXgeH6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 25, 2015 at 8:32 PM, Simon Riggs wrote:
> Let's start with a complex, fully described use case then work out how to
> specify what we want.

Well, one of the most simple cases where quorum commit and this
feature would be useful for is that, with 2 data centers:
- on center 1, master A and standby B
- on center 2, standby C and standby D
With the current synchronous_standby_names, what we can do now is
ensuring that one node has acknowledged the commit of master. For
example synchronous_standby_names = 'B,C,D'. But you know that :)
What this feature would allow use to do is for example being able to
ensure that a node on the data center 2 has acknowledged the commit of
master, meaning that even if data center 1 completely lost for a
reason or another we have at least one node on center 2 that has lost
no data at transaction commit.

Now, regarding the way to express that, we need to use a concept of
node group for each element of synchronous_standby_names. A group
contains a set of elements, each element being a group or a single
node. And for each group we need to know three things when a commit
needs to be acknowledged:
- Does my group need to acknowledge the commit?
- If yes, how many elements in my group need to acknowledge it?
- Does the order of my elements matter?

That's where the micro-language idea makes sense to use. For example,
we can define a group using separators and like (elt1,...eltN) or
[elt1,elt2,eltN]. Appending a number in front of a group is essential
as well for quorum commits. Hence for example, assuming that '()' is
used for a group whose element order does not matter, if we use that:
- k(elt1,elt2,eltN) means that we need for the k elements in the set
to return true (aka commit confirmation).
- k[elt1,elt2,eltN] means that we need for the first k elements in the
set to return true.

When k is not defined for a group, k = 1. Using only elements
separated by commas for the upper group means that we wait for the
first element in the set (for backward compatibility), hence:
1(elt1,elt2,eltN) <=> elt1,elt2,eltN

We could as well mix each behavior, aka being able to define for a
group to wait for the first k elements and a total of j elements in
the whole set, but I don't think that we need to go that far. I
suspect that in most cases users will be satisfied with only cases
where there is a group of data centers, and they want to be sure that
one or two in each center has acknowledged a commit to master
(performance is not the matter here if centers are not close). Hence
in the case above, you could get the behavior wanted with this
definition:
2(B,(C,D))
With more data centers, like 3 (wait for two nodes in the 3rd set):
3(B,(C,D),2(E,F,G))
Users could define more levels of group, like that:
2(A,(B,(C,D)))
But that's actually something few people would do in real cases.

> I'm nervous of "it would be good ifs" because we do a ton of work only to
> find a design flaw.

That makes sense. Let's continue arguing on it then.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-06-26 05:59:52 Re: Support for N synchronous standby servers - take 2
Previous Message Amit Kapila 2015-06-26 03:57:58 Re: RFC: replace pg_stat_activity.waiting with something more descriptive