From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Reorder shutdown sequence, to flush pgstats later |
Date: | 2025-01-14 23:55:03 |
Message-ID: | vhljhxaipf3b4vy2azovme33g7v43ysm5f4d3gokgsby3k3iwd@7kf7bzbiism2 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-01-14 13:06:31 +0000, Bertrand Drouvot wrote:
> On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote:
> > The current code has the weird behaviour of going through PM_WAIT_BACKENDS. I
> > left it like that for now. In fact more paths do so now, because we did so for
> > PM_WAIT_XLOG_SHUTDOWN but *didn't* do so for PM_WAIT_XLOG_ARCHIVAL, which
> > seems rather weird.
> >
> > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > for another time.
>
> That would probably make more sense, but would be more invasive and would
> need careful testing. Worth a XXX comment in the code?
There is, although I guess it could be reformulated a bit.
> ======== Patch 0003:
>
> It replaces the take_action boolean flag with an early return.
>
> One comment:
>
> === 1
>
> + TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
>
> if (Shutdown != ImmediateShutdown)
> FatalError = true;
>
> I think we can avoid this remaining extra check and set FatalError to true
> unconditionally (as we already know that Shutdown != ImmediateShutdown as per
> the first check in the function).
>
> ======== Patch 0004:
>
> One comment:
>
> === 2
>
> As per comment === 1 above, then we could remove:
>
> if (Shutdown != ImmediateShutdown)
> FatalError = true;
>
> from HandleFatalError(). And that would be up to the HandleFatalError() caller
> to set FatalError to true (prior calling HandleFatalError()).
>
> HandleFatalError becomes a pure "transition to error state" function then.
> Does that make sense to you?
I don't see what we'd gain from moving FatalError = true separate from
HandleFatalError? All it'll mean is more copied code?
> === 4
>
> + * XXX: Is it worth inventing a different PMQUIT value
> + * that signals that the cluster is in a bad state,
> + * without a process having crashed?
> */
>
> That would be more "accurate". Something like PMQUIT_FORK_FAILURE or such.
For now I think I'll leave it as such, as we'd need to have a new message in
quickdie(). While the primary message for PMQUIT_FOR_CRASH message would
differ, the rest seems like it'd largely be duplicated:
case PMQUIT_FOR_CRASH:
/* A crash-and-restart cycle is in progress */
ereport(WARNING_CLIENT_ONLY,
(errcode(ERRCODE_CRASH_SHUTDOWN),
errmsg("terminating connection because of crash of another server process"),
errdetail("The postmaster has commanded this server process to roll back"
" the current transaction and exit, because another"
" server process exited abnormally and possibly corrupted"
" shared memory."),
errhint("In a moment you should be able to reconnect to the"
" database and repeat your command.")));
break;
That seems like a separate change. Particularly because I'm just working on
this as part of some nasty three layer deep dependency chain - aio is pretty
big on its own...
> > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > for another time.
>
> Before doing so, what do you think about:
>
> 1. keeping FatalError = true; as suggeted above
> 2. put back the UpdatePMState(PM_WAIT_DEAD_END) (prior to the HandleFatalError(
> PMQUIT_FOR_CRASH, false) call
I don't like that, what's the point of having something like HandleFatalError
if we duplicate more code than it saves to each place?
I don't think we need to either, we can just do that in the relevant case
statement in HandleFatalError(). That works, I went back and forth several
times :)
> 3. put the ConfigurePostmasterWaitSet(false) in the PM_WAIT_DEAD_END switch in
> HandleFatalError()?
That one would make sense to me.
> But that would also need to call TerminateChildren() (a second time) again after
> ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
> TerminateChildren() calls in the switch or add a check on PM_WAIT_DEAD_END in
> the first call).
Why would it need to?
> === 8
>
> + * Now that everyone important is gone
>
> s/everyone important is/walsenders and archiver are also gone/?
I'd rather get away from these lists of specific processes - they end up being
noisy in the git history because the lines get updated to add/remove items (or
updates to them get missed).
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-01-14 23:58:01 | Re: pgbench error: (setshell) of script 0; execution of meta-command failed |
Previous Message | Sami Imseih | 2025-01-14 23:01:52 | Re: POC: track vacuum/analyze cumulative time per relation |