Quick Links

Re: Reorder shutdown sequence, to flush pgstats later

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Reorder shutdown sequence, to flush pgstats later
Date:	2025-01-10 19:07:25
Message-ID:	hpkmb7tbfijhsa3qqgzbv2nbh4msyhnogyacdq6hy4jzzjajb7@ifyeyoq4fpy2
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I've pushed 0001-0005.

I think 0006 isn't far from ready.

However, I think 0007 needs a bit more work.

One thing that worries me a bit is that using SIGINT for triggering the
shutdown checkpoint could somehow be unintentionally triggered? Previously
checkpointer would effectively have ignored that, but now it'd bring the whole
cluster down (because postmaster would see an unexpected ShutdownXLOG()). But
I can't really see a legitimate reason for checkpointer to get spurious
SIGINTs.

This made me think about what happens if a spurious SIGINT is sent - and in
the last version the answer wasn't great, namely everything seemed to go on
normal, except that the cluster couldn't be shut down normally. There wasn't
any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as
fatal. I think this needs to be treated the same way we treat not being able
to fork checkpointer during a shutdown. Now receiving
PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers
FatalError processing after logging "WAL was shut down unexpectedly". We have
multiple copies of code to go to FatalError, that doesn't seem great.

Another thing I started worrying about is that the commit added
PM_WAIT_CHECKPOINTER *after* PM_WAIT_DEAD_END. However, we have a few places
where we directly jump to PM_WAIT_DEAD_END in fatal situations:

/*
* Stop any dead-end children and stop creating new ones.
*/
UpdatePMState(PM_WAIT_DEAD_END);
ConfigurePostmasterWaitSet(false);
SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
and
/*
* If we failed to fork a checkpointer, just shut down.
* Any required cleanup will happen at next restart. We
* set FatalError so that an "abnormal shutdown" message
* gets logged when we exit.
*
* We don't consult send_abort_for_crash here, as it's
* unlikely that dumping cores would illuminate the reason
* for checkpointer fork failure.
*/
FatalError = true;
UpdatePMState(PM_WAIT_DEAD_END);
ConfigurePostmasterWaitSet(false);

I don't think this actively causes problems today, but it seems rather iffy.

Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier
thought it'd be better to shut checkpointer down after even dead-end children
exited, in case we ever wanted to introduce stats for dead-end children - but
that seems rather unlikely?

I also noticed that checkpointer.c intro comment needed some editing.

Attached are the rebased and edited last two patches.

Independent of this patch, we seem to deal with FatalError processing in a
somewhat inconsistently. We have this comment (in master):
/*
* PM_WAIT_DEAD_END state ends when all other children are gone except
* for the logger. During normal shutdown, all that remains are
* dead-end backends, but in FatalError processing we jump straight
* here with more processes remaining. Note that they have already
* been sent appropriate shutdown signals, either during a normal
* state transition leading up to PM_WAIT_DEAD_END, or during
* FatalError processing.

However that doesn't actually appear to be true in the most common way to
reach FatalError, via HandleChildCrash():

if (Shutdown != ImmediateShutdown)
FatalError = true;

/* We now transit into a state of waiting for children to die */
if (pmState == PM_RECOVERY ||
pmState == PM_HOT_STANDBY ||
pmState == PM_RUN ||
pmState == PM_STOP_BACKENDS ||
pmState == PM_WAIT_XLOG_SHUTDOWN)
UpdatePMState(PM_WAIT_BACKENDS);

Here we
a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS

From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes
other than walsender, archiver and dead-end children exited.

b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to
PM_WAIT_BACKENDS?

This comment seems to have been added in

commit a78af0427015449269fb7d9d8c6057cfcb740149
Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Date: 2024-11-14 16:12:28 +0200

Assign a child slot to every postmaster child process

ISTM that section of the comment is largely wrong and has never really been
the case if my git sleuthing is correct?

We do have one place that directly switches to PM_WAIT_DEAD_END:
/*
* If we failed to fork a checkpointer, just shut down.
* Any required cleanup will happen at next restart. We
* set FatalError so that an "abnormal shutdown" message
* gets logged when we exit.
*
* We don't consult send_abort_for_crash here, as it's
* unlikely that dumping cores would illuminate the reason
* for checkpointer fork failure.
*/
FatalError = true;
UpdatePMState(PM_WAIT_DEAD_END);
ConfigurePostmasterWaitSet(false);

but I suspect that's just about unreachable these days. If checkpointer exits
outside of shutdown processing, it's treated as as a reason to crash-restart:
/*
* Any unexpected exit of the checkpointer (including FATAL
* exit) is treated as a crash.
*/
HandleChildCrash(pid, exitstatus,
_("checkpointer process"));

During postmaster's first start checkpointer is launched before the startup
process is even started. During crash restart processing, we do start the
startup process before checkpointer, but start checkpointer in
LaunchMissingBackgroundProcesses().

It looks like it's theoretically possible that recovery completes before ever
reach LaunchMissingBackgroundProcesses() - but it seems like we should address
that by having the same "process startup order" during crash recovery
processing as we have during the first start?

I think this oddity originated in

commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
Author: Thomas Munro <tmunro(at)postgresql(dot)org>
Date: 2021-08-02 17:32:20 +1200

Run checkpointer and bgwriter in crash recovery.

ISTM that we should change it so that the first and crash recovery scenarios
are more similar. Thomas, what do you think?

Separately, is it a problem that somewhere in process_pm_* we did
ConfigurePostmasterWaitSet(false) but then the event loop in ServerLoop()
continues to process "old" WL_SOCKET_ACCEPT events?

Greetings,

Andres Freund

Attachment	Content-Type	Size
v3-0001-checkpointer-Request-checkpoint-via-latch-instead.patch	text/x-diff	4.6 KB
v3-0002-Change-shutdown-sequence-to-terminate-checkpointe.patch	text/x-diff	20.0 KB

In response to

Re: Reorder shutdown sequence, to flush pgstats later at 2025-01-09 12:01:05 from Bertrand Drouvot

Responses

Re: Reorder shutdown sequence, to flush pgstats later at 2025-01-13 12:20:39 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-01-10 19:19:55	Re: use a non-locking initial test in TAS_SPIN on AArch64
Previous Message	Nikolay Samokhvalov	2025-01-10 18:50:05	Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option