On Thu, Mar 4, 2021 at 11:02 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Wed, Mar 3, 2021 at 8:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 2, 2021 at 7:22 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > Why do we need to move promote related code in XLogAcceptWrites?
> > > IMHO, this promote related handling should be in StartupXLOG only.
> > > That will look cleaner.
> >
> > The key design question here, at least in my mind, is what exactly
> > happens after prohibit-WAL + system-crash + recovery-finishes. We
> > clearly can't write the checkpoint or end-of-recovery record and
> > proceed with business as usual, but are we still in recovery? Either
> > (1) we are technically still in recovery, stopping just short of
> > entering normal running, and will emerge from recovery when WAL is
> > permitted again; or (2) we have technically finished recovery, but
> > deferred some of the actions that would normally occur at that time
> > until a later point. Maybe this is academic distinction as much as
> > anything, but the idea is if we choose #1 then we should do as little
> > as possible at the point when recovery finishes and defer as much as
> > possible until we actually enter normal running; whereas if we choose
> > #2 we should do as much as possible at the point when recovery
> > finishes and defer only those things which absolutely have to be
> > deferred. That said, I and I think also Andres are voting for #2.
> >
> > But if we go that way, that precludes what you are proposing here. If
> > we picked #1 then it would be natural for the startup process to
> > remain active and the control file update to be postponed until WAL
> > writes are re-enabled; but under model #2 we want, if possible, for
> > the startup process to exit and the control file update to happen
> > normally, and only the writing of the actual WAL records to be
> > deferred.
> >
>
> Current patch doing a mix of both, startup process exits without doing
> WAL writes and control file updates, that happens later when system
> changes to read-write.
>
> > What I find much odder, looking at the present patch, is that
> > PerformPendingStartupOperations() gets called from pg_prohibit_wal()
> > rather than by the checkpointer. If the checkpointer is the process
> > that is in charge of coordinating the change between a read-only state
> > and a read-write state, then it ought to also do this. I also think
> > that the PerformPendingStartupOperations() wrapper is unnecessary.
> > Just invert the sense of the XLogCtl flag: xlogAllowWritesDone, rather
> > than startupCrashRecoveryPending, and have XLogAcceptWrites() set it
> > (and return without doing anything if it's already set). Then the
> > checkpointer can just call the function unconditionally whenever we go
> > read-write, and for a bonus we will have much better naming
> > consistency, rather than calling the same thing "xlog accept writes"
> > in one place, "pending startup operations" in another, and "startup
> > crash recovery pending" in a third.
> >
>
> Ok, in the attached version, I have used the xlogAllowWritesDone variable.
> To match the naming sense, it should be set to 'false' initially and
> should get set to 'true' when the XLogAcceptWrites() operation completes.
>
> I have removed the PerformPendingStartupOperations() wrapper function and I have
> slightly changed XLogAcceptWrites() to minimize its parameter count so that it
> can use available global variable values instead of parameters. Unfortunately,
> it cannot be called from checkpointer unconditionally, it will create a race
> with startup process when startup process still in recovery and checkpointer
> launches and see that xlogAllowWritesDone = false, will go-ahead for those wal
> write operations and end-of-recovery checkpoint which will be a
> disaster. Therefore, I moved this XLogAcceptWrites() function inside
> ProcessWALProhibitStateChangeRequest() and called when the system is in
> GOING_READ_WRITE transition state. Since ProcessWALProhibitStateChangeRequest()
> gets called from a different places of checkpointer process which creates a
> cascaded call to XLogAcceptWrites() function, to avoid that I am updating
> xlogAllowWritesDone = true immediately after it gets checked in
> XLogAcceptWrites() which I think is not the right approach, technically, it
> should be updated at the end of XLogAcceptWrites().
>
> I think instead of xlogAllowWritesDone, we should use invert of it, as
> the previous, e.g.xlogAllowWritesPending or xlogAllowWritesSkipped or something
> else and that will be get explicitly set 'true' when we skip XLogAcceptWrites()
> call. That will avoid the race of checkpointer process with the startup since
> initially, it will be 'false', and if it is 'false' we will return immediately
> from XLogAcceptWrites(). Also, we don't need to move XLogAcceptWrites() inside
> ProcessWALProhibitStateChangeRequest(), it can be called from checkpointerMain()
> loop which also avoids cascade calls and we don't need to update it until we
> complete those write operations. Thoughts/Comments?
>
In the attached version, I am able to fix most of the concerns that I had. Right
now, having the xlogAllowWritesDone variable is fine, and that will get updated
at the end of the XLogAcceptWrites() function, unlike the previous.
XLogAcceptWrites() will be called from ProcessWALProhibitStateChangeRequest()
while the system state changes to read-write, like previous. Now to avoid the
recursive call to ProcessWALProhibitStateChangeRequest() from the
end-of-recovery checkpoint happening in XLogAcceptWrites(), I have added a
private boolean state variable in walprohibit.c, using it wal prohibit state
transition can be put on hold for some time; did the same while calling
XLogAcceptWrites().
Regards,
Amul