Re: Managing IO workers in postmaster's state machine

From: Cédric Villemain <Cedric(dot)Villemain(at)Data-Bene(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Managing IO workers in postmaster's state machine
Date: 2024-12-13 09:59:20
Message-ID: 1e16b1f8-c79c-4ce0-b516-7d51b88b9825@Data-Bene.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

Thanks for sharing these detailed questions and insights. Below are some
comments and additional thoughts that might help, particularly regarding
bgworkers.

On 09/12/2024 22:42, Andres Freund wrote:

> Hi,
>
> For AIO, when using the worker process based "AIO", we need to manage a set of
> workers. These need to be started by postmaster, obviously.
>
> I have a few questions about the design details. They seemed mostly
> independent of the larger AIO thread, so I started this separately.
>
>
> 1) Where to start/stop workers after a config reload?
>
> The GUC that controls how many workers are allowed is PGC_SIGHUP. Therefore we
> need to adjust the number of workers after a config reload.
>
> Right now this is implemented (I think by Thomas, but it could have been me),
> in a GUC assign hook. But that seems rather iffy.
>
> For example, I think this means that, as we're in the middle of the GUC reload
> process, a child forked in that moment may have some updated GUCs, but not
> others. But even besides that, it "feels wrong", in a way I can't fully
> articulate.
>
> However, it's not exactly obvious *where* workers should be started / stopped
> instead.
>
> One approach would be to put the adjustment in PostmasterStateMachine(), but
> currently a config reload request doesn't trigger an invocation of
> PostmasterStateMachine().
>
> We have a somewhat similar case for bgworkers, for those we have a dedicated
> call in ServerLoop(). I guess we could just copy that, but it doesn't seem
> exactly pretty.

I am unsure why background workers are managed directly in the
ServerLoop(), wouldn't it help if a "background worker launcher" exist
instead ?

> 2) New state machine phase for AIO shutdown?
>
> During non-crash shutdown, we currently stop archiver and walsender last
> (except dead-end children, but those don't matter in this context). But at
> least walsender might use AIO, so we need to shut down AIO after walsender.
>
> We already have PM_SHUTDOWN, PM_SHUTDOWN_2. I didn't want to introduce
> PM_SHUTDOWN_3. For now there's a new PM_SHUTDOWN_IO.
>
> Is that the right approach?
>
> But somehow the number of shutdown phases seems to be getting out of control,
> besides PM_SHUTDOWN* we also have PM_STOP_BACKENDS, PM_WAIT_BACKENDS,
> PM_WAIT_DEAD_END and PM_NO_CHILDREN...

For the startup there is a StartupStatusEnum, it serves distinct purpose
but maybe having ShutdownSequence/ShutdownStatus can be helpful: just
outline order of operations for the shutdown state ? Maybe nearest from
a chain of responsability than the existing state machine...

> 3) Do we need an new PM* phase at startup?
>
> Because the startup process - obviously - can perform IO, IO workers need to
> be started before the startup process is started.
>
> We already do this with checkpointer and bgwriter. For those we don't have a
> new phase, we just do so before forking the startup process. That means that
> the startup process might need to wait for checkpointer to finish starting up,
> but that seems fine to me. The same seems true for IO workers.
>
> Therefore I don't see a need for a new phase?

So it should belong to PM_INIT? but the next question:

> 4) Resetting pmState during crash-restart?
>
> For the IO worker handling we need to check the phase we currently are in to
> know whether to start new workers (so we don't start more workers while
> shutting down).
>
> That's easy during normal startup, the state is PM_INIT. But for a
> crash-restart (bottom of PostmasterStateMachine()), pmState will be
> PM_NO_CHILDREN - in which we would *NOT* want to start new IO workers normally
> (if e.g. a SIGHUP is processed while in that state). But we do need to start
> IO workers at that point in the crash-restart cycle.
>
> The AIO patchset has dealt with that by moving pmState = PM_STARTUP to a bit
> earlier. But that seems a *bit* weird, even though I don't see any negative
> consequences right now.
>
> Would it be better to reset pmState to PM_INIT? That doesn't seem quite right
> either, it's described as "postmaster starting".

The comments in postmaster.c said that crash recovery is like shutdown
followed by startup (not by init).
If I understood correctly IO workers will need to be restarted in such
context, so they must be after PM_INIT ...

> 5) btmask_all_except2() not used anymore, btmask_all_except3() needed
>
> The one place using btmask_all_except2() now also needs to allow IO workers,
> and thus btmask_all_except3(). Which triggers warnings about
> btmask_all_except2 being unused.
>
> We could remove btmask_all_except2(), ifdef it out or create a more generic
> version that works with arbitrary numbers of arguments.
>
> Something like
>
> static inline BackendTypeMask
> btmask_all_except_n(int nargs, BackendType *t)
> {
> BackendTypeMask mask = BTYPE_MASK_ALL;
>
> for (int i = 0; i < nargs; i++)
> mask = btmask_del(mask, t[i]);
> return mask;
> }
>
> #define btmask_all_except(...) \
> btmask_all_except_n( \
> lengthof(((BackendType[]){__VA_ARGS__})), \
> (BackendType[]){__VA_ARGS__} \
> )
>
> should do the trick, I think.
>
>
> 6) Variable numbered aux processes
>
> The IO workers are aux processes - that seemed to be the most
> appropriate. However, if "real" AIO is configured (PGC_POSTMASTER), no IO
> workers can be started.
>
> The only change the patch had to make for that is:
>
> -#define NUM_AUXILIARY_PROCS 6
> +#define MAX_IO_WORKERS 32
> +#define NUM_AUXILIARY_PROCS (6 + MAX_IO_WORKERS)
>
> Which means that we'll reserve some resources for IO workers even if no IO
> workers can be started. Do we consider that a problem? There's some precedent
> - we reserve space for archiver even if not configured, despite archive_mode
> being PGC_POSTMASTER - but of course that's a different scale.

I have not checked the history, I have read some details on this topic,
but the reasoning is not yet very clear to me as **why** this is
hardcoded.  If there's a technical or historical reason for the current
design, it would be useful to revisit that in this context.
What prevent the number of background workers to be dynamically sized
today ?

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2024-12-13 10:03:43 Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.
Previous Message Thomas Munro 2024-12-13 09:44:06 Re: Remaining dependency on setlocale()