From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Quorum commit for multiple synchronous replication. |
Date: | 2016-12-10 08:17:01 |
Message-ID: | CAA4eK1KDdw9JiqBFwvHT4vb_TqOJXePfEx+E=zvwmeZoRD--rw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> You could do that, but first I would code up the simplest, cleanest
>>> algorithm you can think of and see if it even shows up in a 'perf'
>>> profile. Microbenchmarking is probably overkill here unless a problem
>>> is visible on macrobenchmarks.
>>
>> This is what I would go for! The current code is doing a simple thing:
>> select the Nth element using qsort() after scanning each WAL sender's
>> values. And I think that Sawada-san got it right. Even running on my
>> laptop a pgbench run with 10 sync standbys using a data set that fits
>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>> using perf top on a non-assert, non-debug build. Hash tables and
>> allocations get a far larger share. Using the patch,
>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>> nodes. Let's kick the ball for now. An extra patch could make things
>> better later on if that's worth it.
>
> Yeah, since the both K and N could be not large these algorithm takes
> almost the same time. And current patch does simple thing. When we
> need over 100 or 1000 replication node the optimization could be
> required.
> Attached latest v9 patch.
>
Few comments:
+ * In 10.0 we support two synchronization methods, priority and
+ * quorum. The number of synchronous standbys that transactions
+ * must wait for replies from and synchronization method are specified
+ * in synchronous_standby_names. This parameter also specifies a list
+ * of standby names, which determines the priority of each standby for
+ * being chosen as a synchronous standby. In priority method, the standbys
+ * whose names appear earlier in the list are given higher priority
+ * and will be considered as synchronous. Other standby servers appearing
+ * later in this list represent potential synchronous standbys. If any of
+ * the current synchronous standbys disconnects for whatever reason,
+ * it will be replaced immediately with the next-highest-priority standby.
+ * In quorum method, the all standbys appearing in the list are
+ * considered as a candidate for quorum commit.
In the above description, is priority method represented by FIRST and
quorum method by ANY in the synchronous_standby_names syntax? If so,
it might be better to write about it explicitly.
2.
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
}
/*
- * Allocate a new 'memtuples' array, for the heap. It will hold one tuple
- * from each input tape.
- */
- state->memtupsize = numInputTapes;
- state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
- USEMEM(state, GetMemoryChunkSpace(state->memtuples));
-
- /*
- * Use all the remaining memory we have available for read buffers among
- * the input tapes.
+ * Use all the spare memory we have available for read buffers among the
+ * input tapes.
This doesn't belong to this patch.
3.
+ * Return the list of sync standbys using according to synchronous method,
In above sentence, "using according" seems to either incomplete or wrong usage.
4.
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ "invalid synchronization method is specified \"%d\"",
+ SyncRepConfig->sync_method));
Here, the error message doesn't seem to aligned and you might want to
use errmsg for the same.
5.
+ * In priority method, we need the oldest these positions among sync
+ * standbys. In quorum method, we need the newest these positions
+ * specified by SyncRepConfig->num_sync.
/oldest these/oldest of these
/newest these positions specified/newest of these positions as specified
Instead of newest, can we consider to use latest?
6.
+ int sync_method; /* synchronization method */
/* member_names contains nmembers consecutive nul-terminated C strings */
char member_names[FLEXIBLE_ARRAY_MEMBER];
} SyncRepConfigData;
Can't we use 1 or 2 bytes to store sync_method information?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2016-12-10 12:06:17 | Re: Parallel bitmap heap scan |
Previous Message | Petr Jelinek | 2016-12-10 07:50:12 | Re: Logical Replication WIP |