Re: Better client reporting for "immediate stop" shutdowns

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Better client reporting for "immediate stop" shutdowns
Date: 2020-12-22 01:29:03
Message-ID: CALj2ACV0WBjHkX-+iHO1kNYSBbDYH_6cygA6rrP91epNRF9xBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 3:13 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Up to now, if you shut down the database with "pg_ctl stop -m immediate"
> then clients get a scary message claiming that something has crashed,
> because backends can't tell whether the SIGQUIT they got was sent for
> a crash-and-restart situation or because of an immediate-stop command.
>
> This isn't great from a fit-and-finish perspective, and it occurs to me
> that it's really easy to do better: the postmaster can stick a flag
> into shared memory explaining the reason for SIGQUIT. While we don't
> like the postmaster touching shared memory, there doesn't seem to be
> any need for interlocking or anything like that, so there is no risk
> involved that's greater than those already taken by pmsignal.c.

+1 to improve the message.

> So, here's a very simple proposed patch. Some issues for possible
> bikeshedding:
>
> * Up to now, pmsignal.c has only been for child-to-postmaster
> communication, so maybe there is some better place to put the
> support code. I can't think of one though.

+1 to have it here as we already have the required shared memory
initialization code to add in new flags there -
PMSignalState->sigquit_reason.

If I'm correct, quickdie() doesn't access any shared memory because
one of the reason we can be in quickdie() is when the shared memory
itself is corrupted(the comment down below on why we don't call
roc_exit() or atexit() says), in such case, will GetQuitSignalReason()
have some problem in accessing the shared memory i.e. + return
PMSignalState->sigquit_reason;?

> * I chose to report the same message for immediate shutdown as we
> already use for SIGTERM (fast shutdown or pg_terminate_backend()).
> Should it be different, and if so what?

AFAIK, errmsg(terminating connection due to administrator command") is
emitted when there's no specific reason. But we know exactly why we
are terminating in this case, how about having an error message like
errmsg("terminating connection due to immediate shutdown request")));
? There are other places where errmsg("terminating connection due to
XXXX reasons"); is used. We also log messages when an immediate
shutdown request is received errmsg("received immediate shutdown
request").

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-22 01:42:55 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Masahiko Sawada 2020-12-22 01:25:58 Re: Deadlock between backend and recovery may not be detected