Re: Reorder shutdown sequence, to flush pgstats later

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Reorder shutdown sequence, to flush pgstats later
Date: 2025-01-13 22:33:41
Message-ID: q66sn2upj574oow3djdomszqqc563oiqwmsnu4s2zo5mrh74l5@4v7g5n4d35w7
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-01-13 12:20:39 +0000, Bertrand Drouvot wrote:
> On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote:
> > There wasn't
> > any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as
> > fatal. I think this needs to be treated the same way we treat not being able
> > to fork checkpointer during a shutdown. Now receiving
> > PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers
> > FatalError processing after logging "WAL was shut down unexpectedly".
>
> I wonder if we could first ask the postmaster a confirmation that the SIGINT is
> coming from it? (and if not, then simply ignore the SIGINT). I thought about a
> shared memory flag but the postmaster has to be keept away from shared memory
> operations, so that's not a valid idea. Another idea could be that the checkpointer
> sends a new "confirm" request to the postmater first. But then I'm not sure how
> the postmaster could reply back (given the signals that are already being used)
> and that might overcomplicate things.

That sounds more complicated than it's worth it for a hypothetical risk.

> > Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier
> > thought it'd be better to shut checkpointer down after even dead-end children
> > exited, in case we ever wanted to introduce stats for dead-end children - but
> > that seems rather unlikely?
>
> Yeah, given the above, it looks more clean to change that ordering.

I'll post a version that does that in a bit.

> > Independent of this patch, we seem to deal with FatalError processing in a
> > somewhat inconsistently. We have this comment (in master):
> > /*
> > * PM_WAIT_DEAD_END state ends when all other children are gone except
> > * for the logger. During normal shutdown, all that remains are
> > * dead-end backends, but in FatalError processing we jump straight
> > * here with more processes remaining. Note that they have already
> > * been sent appropriate shutdown signals, either during a normal
> > * state transition leading up to PM_WAIT_DEAD_END, or during
> > * FatalError processing.
> >
> > However that doesn't actually appear to be true in the most common way to
> > reach FatalError, via HandleChildCrash():
> >
> > if (Shutdown != ImmediateShutdown)
> > FatalError = true;
> >
> > /* We now transit into a state of waiting for children to die */
> > if (pmState == PM_RECOVERY ||
> > pmState == PM_HOT_STANDBY ||
> > pmState == PM_RUN ||
> > pmState == PM_STOP_BACKENDS ||
> > pmState == PM_WAIT_XLOG_SHUTDOWN)
> > UpdatePMState(PM_WAIT_BACKENDS);
> >
> > Here we
> > a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS
> >
> > From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes
> > other than walsender, archiver and dead-end children exited.
> >
> > b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to
> > PM_WAIT_BACKENDS?
> >
> > This comment seems to have been added in
> >
> > commit a78af0427015449269fb7d9d8c6057cfcb740149
> > Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> > Date: 2024-11-14 16:12:28 +0200
> >
> > Assign a child slot to every postmaster child process
> >
> > ISTM that section of the comment is largely wrong and has never really been
> > the case if my git sleuthing is correct?
>
> I agree that does not look correct but I don't think it's coming from a78af0427015449269fb7d9d8c6057cfcb740149.
> ISTM that a78af0427015449269fb7d9d8c6057cfcb740149 re-worded this already wrong
> comment:
>
> "
> /*
> * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
> * (ie, no dead_end children remain), and the archiver is gone too.
> *
> * The reason we wait for those two is to protect them against a new
> * postmaster starting conflicting subprocesses; this isn't an
> * ironclad protection, but it at least helps in the
> * shutdown-and-immediately-restart scenario. Note that they have
> * already been sent appropriate shutdown signals, either during a
> * normal state transition leading up to PM_WAIT_DEAD_END, or during
> * FatalError processing.
> */
> "

Hm. This version doesn't really seem wrong in the same way / to the same
degree.

> while we could also see:
>
> "
> if (Shutdown != ImmediateShutdown)
> FatalError = true;
>
> /* We now transit into a state of waiting for children to die */
> if (pmState == PM_RECOVERY ||
> pmState == PM_HOT_STANDBY ||
> pmState == PM_RUN ||
> pmState == PM_STOP_BACKENDS ||
> pmState == PM_SHUTDOWN)
> pmState = PM_WAIT_BACKENDS;
> "

I don't think this make the version of the comment you included above
inaccurate? It's saying that the signal has been sent in the state *leading
up" to PM_WAIT_DEAD_END, which does seem accurate? The code the comment is
about just won't be reached until postmaster transition to PM_WAIT_DEAD_END.

> > but I suspect that's just about unreachable these days. If checkpointer exits
> > outside of shutdown processing, it's treated as as a reason to crash-restart:
> > /*
> > * Any unexpected exit of the checkpointer (including FATAL
> > * exit) is treated as a crash.
> > */
> > HandleChildCrash(pid, exitstatus,
> > _("checkpointer process"));
> >
>
> I do agree. I was trying to reach this code path while replying to one of
> the above comments (about PM_WAIT_DEAD_END ordering) but had to replace with
> another if condition to reach it during testing.

I was able to reach it unfortunately

1) Repeated fork failures of checkpointer can lead to it

2) When crash-restarting we, don't start checkpointer immediately, but defer
that until LaunchMissingBackgroundProcesses(). While not easy, it's
possible to trigger a shutdown before that happens.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-01-13 22:35:26 Re: pgsql: Consolidate docs for vacuum-related GUCs in new subsection
Previous Message Jeff Davis 2025-01-13 22:11:18 Re: Reduce TupleHashEntryData struct size by half