Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-06-02 16:04:40
Message-ID: CAHGQGwFFn_xmvP5bXpVYU363a=wG2GRt5o25VQ5AbiHqnPJrdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, May 28, 2015 at 7:07 PM, <feikesteenbergen(at)gmail(dot)com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 13368
>> Logged by: Feike Steenbergen
>> Email address: feikesteenbergen(at)gmail(dot)com
>> PostgreSQL version: 9.4.2
>> Operating system: Debian 8.0 x86_64
>> Description:
>>
>> We sometimes see a standby server promoting itself to master immediately.
>>
>> Analysis shows us that the master still has a promote file in the PGDATA
>> directory. We assume the presence of the promote file (which is copied
>> by pg_basebackup) is triggering the promotion.
>
> If there is a promote file in PGDATA when a standby starts up,
> promotion will be triggered.
>
>> The master itself previously was a standby server. The promotion was done
>> using pg_ctl promote. Analysis of our logs show that we sent pg_ctl promote
>> twice to this cluster, this also is reflected in the server log,
>> "server promoting" shows up twice.
>
> In this case promotion is triggered by CheckForStandbyTrigger(), where
> the promote file is unlinked.
>
>> Some testing shows us that in some cases, when pg_ctl promote is called
>> multiple
>> times, a promote file is left in the PGDATA directory, even though the
>> cluster
>> has been succesfully promoted and is accepting read/write queries.
>
> This is not surprising, pg_ctl bases its analysis that a node needs to
> be promoted if recovery.conf exists or not, and there is an interval
> of time between which recovery.conf is removed and the promotion is
> actually triggered, so you can create a promote file even after even
> sending SIGUSR1 to the standby's postmaster
>
>> We will try to workaround this issue by ensuring we do not send multiple
>> promote request using pg_ctl to the same cluster.
>
> Well, we could for example have the server switch promote to
> promote_done in CheckForStandbyTrigger() and then unlink it when
> recovery.conf is switched to .done. Opinions are welcome on the
> matter.

Or we can just always remove the signal file at the end of recovery.
That filename switch seems unnecessary.

In addition to that change, we should make pg_basebackup skip
the signal file?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2015-06-02 16:14:09 Re: BUG #13391: when use in/= & subquery, non exists column can elected.
Previous Message digoal 2015-06-02 14:07:12 BUG #13391: when use in/= & subquery, non exists column can elected.