From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Clean switchover |
Date: | 2013-06-12 11:38:37 |
Message-ID: | CABUevEyT6xXtumW9mw9Gr7bu43K1+LLZK-jhi=YsWMGYRSLkaA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 12, 2013 at 6:41 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote:
>> Hi,
>>
>> In streaming replication, when we shutdown the master, walsender tries
>> to send all the outstanding WAL records including the shutdown
>> checkpoint record to the standby, and then to exit. This basically
>> means that all the WAL records are fully synced between two servers
>> after the clean shutdown of the master. So, after promoting the standby
>> to new master, we can restart the stopped master as new standby without
>> the need for a fresh backup from new master.
>>
>> But there is one problem: though walsender tries to send all the
>> outstanding WAL records, it doesn't wait for them to be replicated to
>> the standby. IOW, walsender closes the replication connection as soon
>> as it sends WAL records.
>> Then, before receiving all the WAL records, walreceiver can detect the
>> closure of connection and exit. We cannot guarantee that there is no
>> missing WAL in the standby after clean shutdown of the master. In this
>> case, backup from new master is required when restarting the stopped
>> master as new standby. I have experienced this case several times,
>> especially when enabling WAL archiving.
>>
>> The attached patch fixes this problem. It just changes walsender so
>> that it waits for all the outstanding WAL records to be replicated to
>> the standby before closing the replication connection.
>>
>> You may be concerned the case where the standby gets stuck and the
>> walsender keeps waiting for the reply from that standby. In this case,
>> wal_sender_timeout detects such inactive standby and then walsender
>> ends. So even in that case, the shutdown can end.
>
> Do you think it can impact time to complete shutdown?
> After completing shutdown, user will promote standby to master, so if there
> is delay in shutdown, it can cause delay in switchover.
I'd expect a controlled switchover to happen without dataloss. Yes,
this could make it take a bit longer time, but it guarantees you don't
loose data. ISTM that if you don't care about the potential dataloss,
you can just use a faster shutdown method (e.g. immediate)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2013-06-12 11:47:46 | Re: Adding IEEE 754:2008 decimal floating point and hardware support for it |
Previous Message | Andrew Dunstan | 2013-06-12 11:31:15 | Re: JSON and unicode surrogate pairs |