Re: Unintended restart after recovery error

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unintended restart after recovery error
Date: 2014-11-12 23:30:39
Message-ID: CA+TgmoaV_KT=oTrdZ+xsm4AM69A_vMmRQRRmDWSiotd5v865iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 12, 2014 at 4:52 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>> > While looking at postmaster.c:reaper(), one problematic case occurred to me.
>> >
>> >
>> > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED.
>> >
>> > 2. Checkpointer process is forked and immediately dies.
>> >
>> > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets
>> > FatalError to true.
>> >
>> > 4. Startup process exits with non-zero status code too - either due to SIGQUIT
>> > received from HandleChildCrash or due to some other failure of the startup
>> > process itself. However, FatalError is already set, because of the previous
>> > crash of the checkpointer. Thus reaper() does not set RecoveryError.
>> >
>> > 5. As RecoverError failed to be set to true, postmaster will try to restart
>> > the cluster, although it apparently should not.
>>
>> Why shouldn't postmaster restart the cluster in that case?
>>
>
> At least for the behavior to be consistent with simpler cases of failed
> recovery (e.g. any FATAL error in StartupXLOG), which end up not restarting
> the cluster.

It's true that if the startup process dies we don't try to restart,
but it's also true that if the checkpointer dies we do try to restart.
I'm not sure why this specific situation should be an exception to
that general rule.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-11-12 23:31:48 Re: On partitioning
Previous Message Robert Haas 2014-11-12 23:27:37 Re: On partitioning