From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Quorum commit for multiple synchronous replication. |
Date: | 2016-11-07 15:25:24 |
Message-ID: | CAD21AoC9ZgYEJpw+75AJPM_NUbfqmSeG8q2EMP5ubKHM9dJ-zg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Attached latest patch.
>> Please review it.
>
> Okay, so let's move on with this patch...
Thank you for reviewing this patch.
> + <para>
> + The keyword <literal>ANY</> is omissible, but note that there is
> + not compatibility between <productname>PostgreSQL</> version 10 and
> + 9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the
> + configuration with <literal>FIRST</> and <replaceable
> class="parameter">
> + num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6
> + or before. On the other hand, It's the same as the configuration with
> + <literal>ANY</> and <replaceable
> class="parameter">num_sync</> equal to
> + 1 in <productname>PostgreSQL</> 10 or later.
> + </para>
> This paragraph could be reworded:
> If FIRST or ANY are not specified, this parameter behaves as ANY. Note
> that this grammar is incompatible with PostgreSQL 9.6, where no
> keyword specified is equivalent as if FIRST was specified.
> In short, there is no real need to specify num_sync as this behavior
> does not have changed, as well as it is not necessary to mention
> pre-9.6 versions as the multi-sync grammar has been added in 9.6.
Fixed.
> - Specifying more than one standby name can allow very high availability.
> Why removing this sentence?
>
> + The keyword <literal>ANY</>, coupeld with an interger number N,
> s/coupeld/coupled/ and s/interger/integer/, for a double hit in one
> line, still...
>
> + The keyword <literal>ANY</>, coupeld with an interger number N,
> + chooses N standbys in a set of standbys with the same, lowest,
> + priority and makes transaction commit when WAL records are received
> + those N standbys.
> This could be reworded more simply, for example: The keyword ANY,
> coupled with an integer number N, makes transaction commits wait until
> WAL records are received from N connected standbys among those defined
> in the list of synchronous_standby_names.
>
> + <literal>s2</> and <literal>s3</> wil be considered as synchronous standby
> s/wil/will/
>
> + when standby is considered as a condidate of quorum commit.</entry>
> s/condidate/candidate/
>
> [... stopping here ...] Please be more careful with the documentation
> and comment grammar. There are other things in the patch..
I fix some typo as much as I found.
> A bunch of comments at the top of syncrep.c need to be updated.
>
> +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync);
> +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync);
> Those two should be static routines in syncrep.c, let's keep the whole
> logic about quorum and higher-priority definition only there and not
> bother the callers of them about that.
Fixed.
> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> + return SyncRepGetSyncStandbysPriority(am_sync);
> + else /* SYNC_REP_QUORUM */
> + return SyncRepGetSyncStandbysQuorum(am_sync);
> Both routines share the same logic to detect if a WAL sender can be
> selected as a candidate for sync evaluation or not, still per the
> selection they do I agree that it is better to keep them as separate.
>
> + /* In quroum method, all sync standby priorities are always 1 */
> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> + priority = 1;
> Honestly I don't understand why you are enforcing that. Priority can
> be important for users willing to switch from ANY to FIRST to have a
> look immediately at what are the standbys that would become sync or
> potential.
I thought that since all standbys appearing in s_s_names list are
treated equally in quorum method, these standbys should have same
priority.
If these standby have different sync_priority, it looks like that
master server replicates to standby server based on priority.
> else if (list_member_int(sync_standbys, i))
> - values[7] = CStringGetTextDatum("sync");
> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
> The comment at the top of this code block needs to be refreshed.
Fixed.
> The representation given to the user in pg_stat_replication is not
> enough IMO. For example, imagine a cluster with 4 standbys:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 0 | async
> node_5434 | 0 | async
> node_5435 | 0 | async
> node_5436 | 0 | async
> (4 rows)
>
> If FIRST N is used, is it easy for the user to understand what are the
> nodes in sync:
> =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433,
> node_5434, node_5435)';
> ALTER SYSTEM
> =# select pg_reload_conf();
> pg_reload_conf
> ----------------
> t
> (1 row)
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 1 | sync
> node_5434 | 2 | sync
> node_5435 | 3 | potential
> node_5436 | 0 | async
> (4 rows)
>
> In this case it is easy to understand that two nodes are required to be in sync.
>
> When using ANY similarly for three nodes, here is what
> pg_stat_replication tells:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
> application_name | sync_priority | sync_state
> ------------------+---------------+------------
> node_5433 | 1 | quorum
> node_5434 | 1 | quorum
> node_5435 | 1 | quorum
> node_5436 | 0 | async
> (4 rows)
>
> It is not possible to guess from how many standbys this needs to wait
> for. One idea would be to mark the sync_state not as "quorum", but
> "quorum-N", or just add a new column to indicate how many in the set
> need to give a commit confirmation.
As Simon suggested before, we could support another feature that
allows the client to control the quorum number.
Considering adding that feature, I thought it's better to have and
control that information as a GUC parameter.
Thought?
Attached latest v5 patch.
Please review it.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
000_quorum_commit_v5.patch | text/x-diff | 27.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-11-07 15:29:15 | Re: Add PGDLLEXPORT to PG_FUNCTION_INFO_V1 |
Previous Message | Peter Eisentraut | 2016-11-07 15:13:48 | Re: add more NLS to bin |