From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support for N synchronous standby servers - take 2 |
Date: | 2016-01-19 05:55:22 |
Message-ID: | CAB7nPqRaCyO99tgBuZ4Ox6Abg85+mCTG8o_2Y4a_2rtEa9bhkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>>> * Confirm that the data is synchronously replicated to multiple
>>> standbys in same cases.
>>> * case 1 : The standby which is not listed in s_s_name, is down
>>> * case 2 : The standby which is listed in s_s_names but potential
>>> standby, is down
>>> * case 3 : The standby which is considered as sync standby, is down.
>>> * Standby promotion
>>>
>>> In order to confirm that the commit isn't done in case #3 forever
>>> unless new sync standby is up, I think we need the framework that
>>> cancels executing query.
>>> That is, what I'm planning is,
>>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>>> 2. Set up two standby servers
>>> 3. Standby1 is down
>>> 4. Create some contents on master (But transaction is not committed)
>>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>>> standby2 makes progress)
>>
>> This will need some thinking and is not as easy as it sounds. There is
>> no way to hold on a connection after executing a query in the current
>> TAP infrastructure. You are just mentioning case 3, but actually cases
>> 1 and 2 are falling into the same need: if there is a failure we want
>> to be able to not be stuck in the test forever and have a way to
>> cancel a query execution at will. TAP uses psql -c to execute any sql
>> queries, but we would need something that is far lower-level, and that
>> would be basically using the perl driver for Postgres or an equivalent
>> here.
>>
>> Honestly for those tests I just thought that we could get to something
>> reliable by just looking at how each sync replication setup reflects
>> in pg_stat_replication as the flow is really getting complicated,
>> giving to the user a clear representation at SQL level of what is
>> actually occurring in the server depending on the configuration used
>> being important here.
>
> I see.
> We could check the transition of sync_state in pg_stat_replication.
> I think it means that it tests for each replication method (switching
> state) rather than synchronization of replication.
>
> What I'm planning to have are,
> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
> * Standby promotion
> * Standby catching up master
> And each replication method has above tests.
>
> Are these enough?
Does promoting the standby and checking that it caught really have
value in this context of this patch? What we just want to know is on a
master, which nodes need to be waited for when s_s_names or any other
method is used, no?
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-01-19 06:32:47 | Re: Proposal: Trigonometric functions in degrees |
Previous Message | David Rowley | 2016-01-19 05:32:36 | Re: Combining Aggregates |