From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hot standby fails if any backend crashes |
Date: | 2012-02-03 05:22:35 |
Message-ID: | CAHGQGwEb5JSkxxXXaFcsWfky1R_JKrrgLhZkf3KBq4X-k6gm5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 3, 2012 at 1:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I'm currently working with Duncan Rance's test case for bug #6425, and
>> I am observing a very nasty behavior in HEAD: once one of the
>> hot-standby query backends crashes, the standby postmaster SIGQUIT's
>> all its children and then just quits itself, with no log message and
>> apparently no effort to restart. Surely this is not intended?
>
> I looked through postmaster.c and found that the cause of this is pretty
> obvious: if the startup process exits with any non-zero status, we
> assume that represents an unrecoverable error condition, and set
> RecoveryError which causes the postmaster to exit silently as soon as
> its last child is gone. But we do this even if the reason the startup
> process did exit(1) is that we sent it SIGQUIT as a result of a crash of
> some other process. Of course this logic dates from a time where the
> startup process could not have any siblings, so when it was written,
> such a thing was impossible.
>
> I think saner behavior might only require this change:
>
> /*
> * Any unexpected exit (including FATAL exit) of the startup
> * process is treated as a crash, except that we don't want to
> * reinitialize.
> */
> if (!EXIT_STATUS_0(exitstatus))
> {
> - RecoveryError = true;
> + if (!FatalError)
> + RecoveryError = true;
> HandleChildCrash(pid, exitstatus,
> _("startup process"));
> continue;
> }
>
> plus suitable comment adjustments of course. Haven't tested this yet
> though.
Looks good to me.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-02-03 06:45:13 | Re: BUG #6425: Bus error in slot_deform_tuple |
Previous Message | Tom Lane | 2012-02-03 04:48:46 | Re: Hot standby fails if any backend crashes |