Re: walsender.c fileheader comment

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: walsender.c fileheader comment
Date: 2024-07-16 20:56:11
Message-ID: cdf6841c-a283-4f41-b0b0-a46819c827fa@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/11/24 04:35, Peter Smith wrote:
> Hi,
>
> I was reading the walsender.c fileheader comment while studying
> another thread. I think if there is logical replication in progress
> then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
> a "stopping" state: e.g.,
>
> /*
> * Handle PROCSIG_WALSND_INIT_STOPPING signal.
> */
> void
> HandleWalSndInitStopping(void)
> {
> Assert(am_walsender);
>
> /*
> * If replication has not yet started, die like with SIGTERM. If
> * replication is active, only set a flag and wake up the main loop. It
> * will send any outstanding WAL, wait for it to be replicated to the
> * standby, and then exit gracefully.
> */
> if (!replication_active)
> kill(MyProcPid, SIGTERM);
> else
> got_STOPPING = true;
> }
>
> ~~~
>
> But the walsender.c fileheader comment seems to be saying something
> slightly different. IIUC, some minor rewording of the comment is
> needed so it describes the code better.
>
> HEAD
> ...
> * shutdown, if logical replication is in progress all existing WAL records
> * are processed followed by a shutdown. Otherwise this causes the walsender
> * to switch to the "stopping" state. In this state, the walsender will reject
> * any further replication commands. The checkpointer begins the shutdown
> ...
>
> SUGGESTION
> .. shutdown. If logical replication is in progress, the walsender
> switches to a "stopping" state. In this state, the walsender will
> reject any further replication commands - but all existing WAL records
> are processed - followed by a shutdown.
>
> ~~~
>
> I attached a patch for the above-suggested change.
>
> Thoughts?

I did look at this, and while the explanation in the current comment may
seem a bit confusing, I'm not sure the suggested changes improve the
situation very much.

This suggests the two comments somehow disagree, but it does not say in
what exactly, so perhaps I just missed it :-(

ISTM there's a bit of confusion what is meant by "stopping" state - you
seem to be interpreting it as a general concept, where the walsender is
requested to stop (through the signal), and starts doing stuff to exit.
But the comments actually talk about WalSnd->state, where "stopping"
means it needs to be set to WALSNDSTATE_STOPPING.

And we only ever switch to that state in two places - in WalSndPhysical
and exec_replication_command. And that does not happen in regular
logical replication (which is what "logical replication is in progress"
refers to) - if you have a walsender just replicating DML, it will never
see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
still in WALSNDSTATE_STREAMING state, and then just exit.

So from this point of view, the suggestion is actually wrong.

To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-07-16 21:11:47 Re: PG_TEST_EXTRA and meson
Previous Message Tristan Partin 2024-07-16 20:53:45 Re: Meson far from ready on Windows