Quick Links

Reorder shutdown sequence, to flush pgstats later

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Reorder shutdown sequence, to flush pgstats later
Date:	2025-01-08 02:10:50
Message-ID:	kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In the AIO patchset I just hit a corner case in which IO workers can emit
stats ([1]). That in turn can trigger an assertion failure, because by the
time the IO workers are shut down, checkpointer already has exited and written
out the stats. In this case the relevant IO has completed, but nothing
triggered a stats flush in the IO workers before checkpointer exited.

I think Bilal has also hit this problem somewhat recently, as part of his work
to track WAL IO in pg_stat_io. We intentionally allow walsenders to be active
after checkpointer writes out the shutdown checkpoint and exits.

Robert suggested, when chatting with him about this problem, that we could use
a global barrier to trigger pending stats to be flushed before checkpointer
exits. While that, I think, would fix "my" issue with pending stats in IO
workers, I don't think it'd fix Bilal's problem.

I think that to properly fix this we need a place to flush stats after
everything has shut down.

One way we could do this would be to introduce a new aux process that's
started at the end of the shutdown sequence. But that seems a bit too big a
hammer, without corresponding benefit.

I instead opted to somewhat rearrange the shutdown sequence:

This commit changes the shutdown sequence so checkpointer is signalled to
trigger writing the shutdown checkpoint without terminating it. Once
checkpointer wrote the checkpoint it will wait for a termination
signal.

Postmaster now triggers the shutdown checkpoint where we previously did so by
terminating checkpointer. Checkpointer is terminated after all other children
have been terminated, tracked using the new PM_SHUTDOWN_CHECKPOINTER PMState.

In addition, the existing PM_SHUTDOWN_2 state is renamed to
PM_XLOG_IS_SHUTDOWN, that seems a lot more descriptive.

This patch is not fully polished, there are a few details I'm not sure about:

- Previously the shutdown checkpoint and process exit were triggered from
HandleCheckpointerInterrupts(). I could have continued doing so in the
patch, but it seemed quite weird to have a wait loop in a function called
HandleCheckpointerInterrupts().

Instead I made the main loop in CheckpointerMain() terminate if checkpointer
was asked to write the shutdown checkpoint or exit

- In process_pm_pmsignal() the code now triggers a PostmasterStateMachine() if
checkpointer signalled ShutdownXLOG having completed. We didn't really have
a need for that before. process_pm_child_exit() does call
PostmasterStateMachine(), but somehow calling it in process_pm_pmsignal()
feels slightly off

- I don't love the naming of the various PMState values (existing and new),
but a larger renaming should probably be done separately?

- There's logging in ShutdownXLOG() that's only triggered if called from
checkpointer. Seems kinda odd - shouldn't we just move that to
checkpointer.c?

The first two patches are just logging improvements that I found helpful to
write this patch:

0001: Instead of modifying pmState directly, do so via a new function
UpdatePMState(), which logs the state transition at DEBUG2.

Requires a helper to stringify PMState.

I've written one-off versions of this patch quite a few times. Without
knowing in what state postmaster is in, it's quite hard to debug the
state machine.

0002: Make logging of postmaster signalling child processes more consistent

I found it somewhat hard to understand what's happening during state
changes without being able to see the signals being sent. While we did
have logging in SignalChildren(), we didn't in signal_child(), and most
signals that are important for the shutdown sequence are sent via
signal_child().

The next is a small prerequisite:

0003: postmaster: Introduce variadic btmask_all_except()

I proposed this in another thread, where I also needed
btmask_all_except3() but then removed the only call to
btmask_all_except2(). Instead of adding/removing code, it seems better
to just use a variadic function.

And then 0004, the reason for this thread.

Comments? Better ideas?

Greetings,

Andres Freund

[1] A buffered write completes and RememberSyncRequest() fails, leading to the
IO workers performing the flush itself.

Attachment	Content-Type	Size
v1-0001-postmaster-Update-pmState-in-new-function.patch	text/x-diff	7.6 KB
v1-0002-postmaster-Improve-logging-of-signals-sent-by-pos.patch	text/x-diff	4.3 KB
v1-0003-postmaster-Introduce-variadic-btmask_all_except.patch	text/x-diff	1.9 KB
v1-0004-WIP-Change-shutdown-sequence-to-terminate-checkpo.patch	text/x-diff	17.6 KB

Responses

Re: Reorder shutdown sequence, to flush pgstats later at 2025-01-08 13:10:03 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michail Nikolaev	2025-01-08 02:12:00	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message	Tomas Vondra	2025-01-08 01:03:25	Re: Proposal: Progressive explain