Quick Links

Re: [Patch] ALTER SYSTEM READ ONLY

From:	Amul Sul <sulamul(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject:	Re: [Patch] ALTER SYSTEM READ ONLY
Date:	2021-10-18 13:54:13
Message-ID:	CAAJ_b94KNpGXzeEuSHTvrMzkRP0Vp-uveTG604GS_up+paPfbQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 14, 2021 at 11:10 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Oct 12, 2021 at 8:18 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > In the attached version I have fixed this issue by restoring missingContrecPtr.
> >
> > To handle abortedRecPtr and missingContrecPtr newly added global
> > variables thought the commit # ff9f111bce24, we don't need to store
> > them in the shared memory separately, instead, we need a flag that
> > indicates a broken record found previously, at the end of recovery, so
> > that we can overwrite contrecord.
> >
> > The missingContrecPtr is assigned to the EndOfLog, and we have handled
> > EndOfLog previously in the 0004 patch, and the abortedRecPtr is the
> > same as the lastReplayedEndRecPtr, AFAICS. I have added an assert to
> > ensure that the lastReplayedEndRecPtr value is the same as the
> > abortedRecPtr, but I think that is not needed, we can go ahead and
> > write an overwrite-contrecord starting at lastReplayedEndRecPtr.
>
> I thought that it made sense to commit 0001 and 0002 at this point, so
> I have done that. I think that the treatment of missingContrecPtr and
> abortedRecPtr may need more thought yet, so at least for that reason I
> don't think it's a good idea to proceed with 0004 yet. 0003 is just
> code movement so I guess that can be committed whenever we're
> confident that we know exactly which things we want to end up inside
> XLogAcceptWrites().
>

Ok.

> I do have a few ideas after studying this a bit more:
>
> - I wonder whether, in addition to moving a few things later as 0002
> did, we also ought to think about moving one thing earlier,
> specifically XLogReportParameters(). Right now, we have, I believe,
> four things that write WAL at the end of recovery:
> CreateOverwriteContrecordRecord(), UpdateFullPageWrites(),
> PerformRecoveryXLogAction(), and XLogReportParameters(). As the code
> is structured now, we do the first three of those things, and then do
> a bunch of other stuff inside CleanupAfterArchiveRecovery() like
> running recovery_end_command, and removing non-parent xlog files, and
> archiving the partial segment, and then come back and do the fourth
> one. Is there any good reason for that? If not, I think doing them all
> together would be cleaner, and would propose to reverse the order of
> CleanupAfterArchiveRecovery() and XLogReportParameters().
>

Yes, that can be done.

> - If we did that, then I would further propose to adjust things so
> that we remove the call to LocalSetXLogInsertAllowed() and the
> assignment LocalXLogInsertAllowed = -1 from inside
> CreateEndOfRecoveryRecord(), the LocalXLogInsertAllowed = -1 from just
> after UpdateFullPageWrites(), and the call to
> LocalSetXLogInsertAllowed() just before XLogReportParameters().
> Instead, just let the call to LocalSetXLogInsertAllowed() right before
> CreateOverwriteContrecordRecord() remain in effect. There doesn't seem
> to be much point in flipping that switch off and on again, and the
> fact that we have been doing so is in my view just evidence that
> StartupXLOG() doesn't do a very good job of getting related code all
> into one place.
>

Currently there are three places that are calling
LocalSetXLogInsertAllowed() and resetting that LocalXLogInsertAllowed
flag as StartupXLOG(), CreateEndOfRecoveryRecord() and the
CreateCheckPoint(). By doing the aforementioned code rearrangement we
can get rid of frequent calls from StartupXLOG() and can completely
remove the need for it in CreateEndOfRecoveryRecord() since that gets
called only from StartupXLOG() directly. Whereas CreateCheckPoint()
too gets called from StartupXLOG() when it is running in a standalone
backend only, at that time we don't need to call
LocalSetXLogInsertAllowed() but if that running in the Checkpointer
process then we need that.

I tried this in the attached version, but I'm a bit skeptical with
changes that are needed for CreateCheckPoint(), those don't seem to be
clean. I am wondering if we could completely remove the need to end of
recovery checkpoint as proposed in [1], that would get rid of
CHECKPOINT_END_OF_RECOVERY operation and the
LocalSetXLogInsertAllowed() requirement in CreateCheckPoint(), and
after that, we were not expecting checkpoint operation in recovery. If
we could do that then we would have LocalSetXLogInsertAllowed() only
at one place i.e. in StartupXLOG (...and in the future in
XLogAcceptWrites()) -- the code that runs only once in a lifetime of
the server and the kludge that the attached patch doing for
CreateCheckPoint() will not be needed.

> - It seems really tempting to invent a fourth RecoveryState value that
> indicates that we are done with REDO but not yet in production, and
> maybe also to rename RecoveryState to something like WALState. I'm
> thinking of something like WAL_STATE_CRASH_RECOVERY,
> WAL_STATE_ARCHIVE_RECOVERY, WAL_STATE_REDO_COMPLETE, and
> WAL_STATE_PRODUCTION. Then, instead of having
> LocalSetXLogInsertAllowed(), we could teach XLogInsertAllowed() that
> the startup process and the checkpointer are allowed to insert WAL
> when the state is WAL_STATE_REDO_COMPLETE, but other processes only
> once we reach WAL_STATE_PRODUCTION. We would set
> WAL_STATE_REDO_COMPLETE where we now call LocalSetXLogInsertAllowed().
> It's necessary to include the checkpointer, or at least I think it is,
> because PerformRecoveryXLogAction() might call RequestCheckpoint(),
> and that's got to work. If we did this, then I think it would also
> solve another problem which the overall patch set has to address
> somehow. Say that we eventually move responsibility for the
> to-be-created XLogAcceptWrites() function from the startup process to
> the checkpointer, as proposed. The checkpointer needs to know when to
> call it ... and the answer with this change is simple: when we reach
> WAL_STATE_REDO_COMPLETE, it's time!
>
> But this idea is not completely problem-free. I spent some time poking
> at it and I think it's a little hard to come up with a satisfying way
> to code XLogInsertAllowed(). Right now that function calls
> RecoveryInProgress(), and if RecoveryInProgress() decides that
> recovery is no longer in progress, it calls InitXLOGAccess(). However,
> that presumes that the only reason you'd call RecoveryInProgress() is
> to figure out whether you should write WAL, which I don't think is
> really true, and it also means that, when the wal state is
> WAL_STATE_REDO_COMPLETE, RecoveryInProgress() would need to return
> true in the checkpointer and startup process and false everywhere
> else, which does not sound like a great idea. It seems fine to say
> that xlog insertion is allowed in some processes but not others,
> because not all processes are necessarily equally privileged, but
> whether or not we're in recovery is supposed to be something about
> which everyone agrees, so answering that question differently in
> different processes doesn't seem nice. XLogInsertAllowed() could be
> rewritten to check the state directly and make its own determination,
> without relying on RecoveryInProgress(), and I think that might be the
> right way to go here.
>
> But that isn't entirely problem-free either, because there's a lot of
> code that uses RecoveryInProgress() to answer the question "should I
> write WAL?" and therefore it's not great if RecoveryInProgress() is
> returning an answer that is inconsistent with XLogInsertAllowed().
> MarkBufferDirtyHint() and heap_page_prune_opt() are examples of this
> kind of coding. It probably wouldn't break in practice right away,
> because most of that code never runs in the startup process or the
> checkpointer and would therefore never notice the difference in
> behavior between those two functions, but if in the future we get the
> read-only feature that this thread is supposed to be about, we'd have
> problems. Not all RecoveryInProgress() calls have this sense - e.g.
> sendDir() in basebackup.c is trying to figure out whether recovery
> ended during the backup, not whether we can write WAL. But perhaps
> this is a good time to go and replace RecoveryInProgress() checks that
> are intending to decide whether or not it's OK to write WAL with
> XLogInsertAllowed() checks (noting that the return value is reversed).
> If we did that, then I think RecoveryInProgress() could also NOT call
> InitXLOGAccess(), and that could be done only by XLogInsertAllowed(),
> which seems like it might be better. But I haven't really tried to
> code all of this up, so I'm not really sure how it all works out.
>

I agree that calling InitXLOGAccess() from RecoveryInProgress() is not
good, but I am not sure about calling it from XLogInsertAllowed()
either, perhaps, both are status check function and general
expectations might be that status checking functions are not going
change and/or initialize the system state. InitXLOGAccess() should
get called from the very first WAL write operation if needed, but if
we don't want to do that, then I would prefer to call InitXLOGAccess()
from XLogInsertAllowed() instead of RecoveryInProgress().

As said before, if we were able to get rid of the need to
end-of-recovery checkpoint [1] then we don't need separate handling in
XLogInsertAllowed() for the Checkpointer process, that would be much
cleaner and for the startup process, we would force
XLogInsertAllowed() return true by calling LocalSetXLogInsertAllowed()
for the time being as we are doing right now.

Regards,
Amul

1] "using an end-of-recovery record in all cases" :
https://postgr.es/m/CAAJ_b95xPx6oHRb5VEatGbp-cLsZApf_9GWGtbv9dsFKiV_VDQ@mail.gmail.com

Attachment	Content-Type	Size
POC-rearrange-code-to-remove-frequent-need-of-LocalS.patch	application/x-patch	2.6 KB

In response to

Re: [Patch] ALTER SYSTEM READ ONLY at 2021-10-14 17:40:12 from Robert Haas

Responses

Re: [Patch] ALTER SYSTEM READ ONLY at 2021-10-18 22:20:34 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2021-10-18 13:56:41	Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?
Previous Message	Alvaro Herrera	2021-10-18 13:44:24	Re: try_relation_open and relation_open behave different.