Re: Dangling Client Backend Process

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dangling Client Backend Process
Date: 2015-10-20 14:58:24
Message-ID: CA+TgmoYUGMn4SQcsA=zScg3kqU1EMiPiRiakgrJd1+eWwMsxKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 20, 2015 at 12:48 AM, Rajeev rastogi
<rajeev(dot)rastogi(at)huawei(dot)com> wrote:
> On 19 October 2015 21:37, Robert Haas [mailto:robertmhaas(at)gmail(dot)com] Wrote:
>
>>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
>><alvherre(at)2ndquadrant(dot)com> wrote:
>>> Andres Freund wrote:
>>>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>>>> > If I recall correctly, he concerned about killing the backends
>>>> > running transactions which could be saved. I have a sympathy with
>>>> > the opinion.
>>>>
>>>> I still don't. Leaving backends alive after postmaster has died
>>>> prevents the auto-restart mechanism to from working from there on.
>>>> Which means that we'll potentially continue happily after another
>>>> backend has PANICed and potentially corrupted shared memory. Which
>>>> isn't all that unlikely if postmaster isn't around anymore.
>>>
>>> I agree. When postmaster terminates without waiting for all backends
>>> to go away, things are going horribly wrong -- either a DBA has done
>>> something stupid, or the system is misbehaving. As Andres says, if
>>> another backend dies at that point, things are even worse -- the dying
>>> backend could have been holding a critical lwlock, for instance, or it
>>> could have corrupted shared buffers on its way out. It is not
>>> sensible to leave the rest of the backends in the system still trying
>>> to run just because there is no one there to kill them.
>>
>>Yep. +1 for changing this.
>
> Seems many people are in favor of this change.
> I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new command).
> Changes are as per approach explained in my earlier mail i.e.
> 1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as WL_POSTMASTER_DEATH.
> 2. There is a possibility that the command is read without waiting on latch. This case is handled by checking postmaster status after command read (i.e. after ReadCommand).
>
> Attached is the patch.

I don't think that proc_exit(1) is the right way to exit here. It's
not very friendly to exit without at least attempting to give the
client a clue about what has gone wrong. I suggest something like
this:

ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to postmaster shutdown")));

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2015-10-20 15:03:53 Re: ROWS FROM(): A Foolish (In)Consistency?
Previous Message Robert Haas 2015-10-20 14:52:05 Re: ROWS FROM(): A Foolish (In)Consistency?