Re: Reorder shutdown sequence, to flush pgstats later

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Reorder shutdown sequence, to flush pgstats later
Date: 2025-01-08 13:10:03
Message-ID: d2cd8fd3-396a-4390-8f0b-74be65e72899@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/01/2025 04:10, Andres Freund wrote:
> I instead opted to somewhat rearrange the shutdown sequence:
>
> This commit changes the shutdown sequence so checkpointer is signalled to
> trigger writing the shutdown checkpoint without terminating it. Once
> checkpointer wrote the checkpoint it will wait for a termination
> signal.
>
> Postmaster now triggers the shutdown checkpoint where we previously did so by
> terminating checkpointer. Checkpointer is terminated after all other children
> have been terminated, tracked using the new PM_SHUTDOWN_CHECKPOINTER PMState.
>
> In addition, the existing PM_SHUTDOWN_2 state is renamed to
> PM_XLOG_IS_SHUTDOWN, that seems a lot more descriptive.

Sounds good.

> This patch is not fully polished, there are a few details I'm not sure about:
>
> - Previously the shutdown checkpoint and process exit were triggered from
> HandleCheckpointerInterrupts(). I could have continued doing so in the
> patch, but it seemed quite weird to have a wait loop in a function called
> HandleCheckpointerInterrupts().
>
> Instead I made the main loop in CheckpointerMain() terminate if checkpointer
> was asked to write the shutdown checkpoint or exit
>
> - In process_pm_pmsignal() the code now triggers a PostmasterStateMachine() if
> checkpointer signalled ShutdownXLOG having completed. We didn't really have
> a need for that before. process_pm_child_exit() does call
> PostmasterStateMachine(), but somehow calling it in process_pm_pmsignal()
> feels slightly off

Looks good to me.

> - I don't love the naming of the various PMState values (existing and new),
> but a larger renaming should probably be done separately?

We currently have PM_WAIT_BACKENDS and PM_WAIT_DEAD_END. Maybe follow
that example and name all the shutdown states after what you're waiting for:

PM_WAIT_BACKENDS, /* waiting for live backends to exit */
PM_WAIT_XLOG_SHUTDOWN, /* waiting for checkpointer to do shutdown
ckpt */
PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to
finish */
PM_WAIT_DEAD_END, /* waiting for dead-end children to exit */
PM_WAIT_CHECKPOINTER, /* waiting for checkpointer to shut down */
PM_NO_CHILDREN, /* all important children have exited */

> - There's logging in ShutdownXLOG() that's only triggered if called from
> checkpointer. Seems kinda odd - shouldn't we just move that to
> checkpointer.c?

What logging do you mean? The "LOG: shutting down" message? No
objections to moving it, although it doesn't bother me either way.

> The first two patches are just logging improvements that I found helpful to
> write this patch:
>
> 0001: Instead of modifying pmState directly, do so via a new function
> UpdatePMState(), which logs the state transition at DEBUG2.
>
> Requires a helper to stringify PMState.
>
> I've written one-off versions of this patch quite a few times. Without
> knowing in what state postmaster is in, it's quite hard to debug the
> state machine.

+1

> elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
> pmState, pmstate_name(pmState), newState, pmstate_name(newState));

I think the state names are enough, no need to print the numerical values.

> 0002: Make logging of postmaster signalling child processes more consistent
>
> I found it somewhat hard to understand what's happening during state
> changes without being able to see the signals being sent. While we did
> have logging in SignalChildren(), we didn't in signal_child(), and most
> signals that are important for the shutdown sequence are sent via
> signal_child().

+1

I don't think the cases where DEBUG2 or DEBUG4 are used currently are
very well thought out. To save a few lines of code, I'd be happy with
merging signal_child_ext() and signal_child() and just always use DEBUG2
or DEBUG4. Or DEBUG3 as a compromise :-).

> The next is a small prerequisite:
>
> 0003: postmaster: Introduce variadic btmask_all_except()
>
> I proposed this in another thread, where I also needed
> btmask_all_except3() but then removed the only call to
> btmask_all_except2(). Instead of adding/removing code, it seems better
> to just use a variadic function.

Nice. A variadic btmask_add() might be handy too.

> And then 0004, the reason for this thread.

Overall, looks good to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-01-08 13:35:51 RE: ecpg command does not warn COPY ... FROM STDIN;
Previous Message Jakub Wartak 2025-01-08 12:49:38 Re: doc: Mention clock synchronization recommendation for hot_standby_feedback