Re: Reorder shutdown sequence, to flush pgstats later

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Reorder shutdown sequence, to flush pgstats later
Date: 2025-01-08 19:26:15
Message-ID: fmbjdqegjjmicoxtht3kph4uutmzsu4y63oohg5h3iaqyvgpci@wqcwl6x6ieus
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-01-08 15:10:03 +0200, Heikki Linnakangas wrote:
> On 08/01/2025 04:10, Andres Freund wrote:
> > I instead opted to somewhat rearrange the shutdown sequence:
> > - I don't love the naming of the various PMState values (existing and new),
> > but a larger renaming should probably be done separately?
>
> We currently have PM_WAIT_BACKENDS and PM_WAIT_DEAD_END. Maybe follow that
> example and name all the shutdown states after what you're waiting for:
>
> PM_WAIT_BACKENDS, /* waiting for live backends to exit */
> PM_WAIT_XLOG_SHUTDOWN, /* waiting for checkpointer to do shutdown ckpt
> */
> PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to
> finish */
> PM_WAIT_DEAD_END, /* waiting for dead-end children to exit */
> PM_WAIT_CHECKPOINTER, /* waiting for checkpointer to shut down */
> PM_NO_CHILDREN, /* all important children have exited */

I think this is much better than before. I don't love PM_WAIT_XLOG_ARCHIVAL,
but I can't think of anything better.

It's probably best to break it out into a separate commit, see attached.

> > - There's logging in ShutdownXLOG() that's only triggered if called from
> > checkpointer. Seems kinda odd - shouldn't we just move that to
> > checkpointer.c?
>
> What logging do you mean? The "LOG: shutting down" message? No objections to
> moving it, although it doesn't bother me either way.

Yea. It's also an oddly unspecific message :).

> > The first two patches are just logging improvements that I found helpful to
> > write this patch:
> >
> > 0001: Instead of modifying pmState directly, do so via a new function
> > UpdatePMState(), which logs the state transition at DEBUG2.
> >
> > Requires a helper to stringify PMState.
> >
> > I've written one-off versions of this patch quite a few times. Without
> > knowing in what state postmaster is in, it's quite hard to debug the
> > state machine.
>
> +1
>
> > elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
> > pmState, pmstate_name(pmState), newState, pmstate_name(newState));
>
> I think the state names are enough, no need to print the numerical values.

I thought the numeric value was mildly helpful as the numerically higher value
makes it slightly easier to understand the "direction" of state changes. But I
also wasn't sure it's worth it. Removed in the attached.

> > 0002: Make logging of postmaster signalling child processes more consistent
> >
> > I found it somewhat hard to understand what's happening during state
> > changes without being able to see the signals being sent. While we did
> > have logging in SignalChildren(), we didn't in signal_child(), and most
> > signals that are important for the shutdown sequence are sent via
> > signal_child().
>
> +1
>
> I don't think the cases where DEBUG2 or DEBUG4 are used currently are very
> well thought out. To save a few lines of code, I'd be happy with merging
> signal_child_ext() and signal_child() and just always use DEBUG2 or DEBUG4.
> Or DEBUG3 as a compromise :-).

Heh. DEBUG2 seems a bit much compared to some other stuff, e.g. a config
reload will be logged at DEBUG2, seems a bit superfluous to then also use
DEBUG2 for all individual signals.

I was a bit hesitant to lower the level in sigquit_child(), but thinking about
it, the fact that we're gonna signal children is already obvious from the log.

So I went for your compromise position :)

> > The next is a small prerequisite:
> >
> > 0003: postmaster: Introduce variadic btmask_all_except()
> >
> > I proposed this in another thread, where I also needed
> > btmask_all_except3() but then removed the only call to
> > btmask_all_except2(). Instead of adding/removing code, it seems better
> > to just use a variadic function.
>
> Nice. A variadic btmask_add() might be handy too.

It does seem to make the code a bit more readable. See attached. Wasn't quite
sure in how many btmask_add() invocations to break it up...

> > And then 0004, the reason for this thread.

One problem I found is this comment:
* Note: we deliberately ignore SIGTERM, because during a standard Unix
* system shutdown cycle, init will SIGTERM all processes at once. We
* want to wait for the backends to exit, whereupon the postmaster will
* tell us it's okay to shut down (via SIGUSR2).

Which would cause problems with v1 of the patch, as postmaster would write a
shutdown checkpoint in response to SIGTERM.

Unfortunately we don't have any obvious unused signals available, as SIGINT is
used to request a checkpoint.

To deal with that I added a new patch to the stack, which makes
RequestCheckpoint() use SetLatch() on checkpointer's latch to wake it up,
instead of kill(SIGINT). Not that that matters, but using SetLatch()
directly, instead of doing SetLatch() in the signal handler, is slightly more
efficient, because it avoids a signal interrupt when checkpointer isn't
waiting on a latch.

The relevant comment mentioned "or is in process of restarting" - but that's
afaict bogus, we don't allow checkpointer to restart.

After that, postmaster can use SIGINT to request checkpointer to write the
shutdown checkpoint.

I'm currently to plan the four patches relatively soon, unless somebody speaks
up, of course. They seem fairly uncontroversial. The renaming of the phases
doesn't need to wait much longer, I think.

The last two (0006: trigger checkpoints via SetLatch, 0007: change the
shutdown sequence), probably can use a few more eyes.

Greetings,

Andres Freund

Attachment Content-Type Size
v2-0001-postmaster-Update-pmState-in-new-function.patch text/x-diff 7.8 KB
v2-0002-postmaster-Improve-logging-of-signals-sent-by-pos.patch text/x-diff 4.2 KB
v2-0003-postmaster-Introduce-variadic-btmask_all_except.patch text/x-diff 1.9 KB
v2-0004-postmaster-Make-btmask_add-variadic.patch text/x-diff 3.6 KB
v2-0005-postmaster-Rename-some-shutdown-related-PMState-p.patch text/x-diff 6.9 KB
v2-0006-checkpointer-Request-checkpoint-via-latch-instead.patch text/x-diff 4.5 KB
v2-0007-Change-shutdown-sequence-to-terminate-checkpointe.patch text/x-diff 18.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-01-08 19:28:33 Re: Virtual generated columns
Previous Message Vik Fearing 2025-01-08 19:23:35 Re: Virtual generated columns