Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: feikesteenbergen(at)gmail(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-06-01 08:19:41
Message-ID: CAB7nPqTzr3fySUdTNmcOUQxAJk7m7V9eOXqfcvBYvYoGiErsUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, May 28, 2015 at 7:07 PM, <feikesteenbergen(at)gmail(dot)com> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 13368
> Logged by: Feike Steenbergen
> Email address: feikesteenbergen(at)gmail(dot)com
> PostgreSQL version: 9.4.2
> Operating system: Debian 8.0 x86_64
> Description:
>
> We sometimes see a standby server promoting itself to master immediately.
>
> Analysis shows us that the master still has a promote file in the PGDATA
> directory. We assume the presence of the promote file (which is copied
> by pg_basebackup) is triggering the promotion.

If there is a promote file in PGDATA when a standby starts up,
promotion will be triggered.

> The master itself previously was a standby server. The promotion was done
> using pg_ctl promote. Analysis of our logs show that we sent pg_ctl promote
> twice to this cluster, this also is reflected in the server log,
> "server promoting" shows up twice.

In this case promotion is triggered by CheckForStandbyTrigger(), where
the promote file is unlinked.

> Some testing shows us that in some cases, when pg_ctl promote is called
> multiple
> times, a promote file is left in the PGDATA directory, even though the
> cluster
> has been succesfully promoted and is accepting read/write queries.

This is not surprising, pg_ctl bases its analysis that a node needs to
be promoted if recovery.conf exists or not, and there is an interval
of time between which recovery.conf is removed and the promotion is
actually triggered, so you can create a promote file even after even
sending SIGUSR1 to the standby's postmaster

> We will try to workaround this issue by ensuring we do not send multiple
> promote request using pg_ctl to the same cluster.

Well, we could for example have the server switch promote to
promote_done in CheckForStandbyTrigger() and then unlink it when
recovery.conf is switched to .done. Opinions are welcome on the
matter.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Merlin Moncure 2015-06-01 13:20:45 Re: Postgres is using 100% CPU
Previous Message Sandeep Thakkar 2015-06-01 08:12:53 Re: BUG #13379: error installing