Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-09-24 11:37:42
Message-ID: CAAJ_b96KcmmJw6rTEX_br485VYL1MAv89pJ31znctN8e6ri_cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 23, 2021 at 11:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Sep 20, 2021 at 11:20 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > Ok, understood, I have separated my changes into 0001 and 0002 patch,
> > and the refactoring patches start from 0003.
>
> I think it would be better in the other order, with the refactoring
> patches at the beginning of the series.
>

Ok, will do that. I did this other way to minimize the diff e.g.
deletion diff of RecoveryXlogAction enum and
DetermineRecoveryXlogAction(), etc.

> > In the 0001 patch, I have copied ArchiveRecoveryRequested to shared
> > memory as said previously. Coping ArchiveRecoveryRequested value to
> > shared memory is not really interesting, and I think somehow we should
> > reuse existing variable, (perhaps, with some modification of the
> > information it can store, e.g. adding one more enum value for
> > SharedRecoveryState or something else), thinking on the same.
> >
> > In addition to that, I tried to turn down the scope of
> > ArchiveRecoveryRequested global variable. Now, this is a static
> > variable, and the scope is limited to xlog.c file like
> > LocalXLogInsertAllowed and can be accessed through the newly added
> > function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let
> > me know what you think about the approach.
>
> I'm not sure yet whether I like this or not, but it doesn't seem like
> a terrible idea. You spelled UNKNOWN wrong, though, which does seem
> like a terrible idea. :-) "acccsed" is not correct either.
>
> Also, the new comments for ArchiveRecoveryRequested /
> ARCHIVE_RECOVERY_REQUEST_* are really not very clear. All you did is
> substitute the new terminology into the existing comment, but that
> means that the purpose of the new "unknown" value is not at all clear.
>

Ok, will fix those typos and try to improve the comments.

> Consider the following two patch fragments:
>
> + * SharedArchiveRecoveryRequested indicates whether an archive recovery is
> + * requested. Protected by info_lck.
> ...
> + * Remember archive recovery request in shared memory state. A lock is not
> + * needed since we are the only ones who updating this.
>
> These two comments directly contradict each other.
>

Okay, the first comment is not clear enough, I will fix that too. I
meant we don't need the lock now since we are the only one updating at
this point.

> + SpinLockAcquire(&XLogCtl->info_lck);
> + XLogCtl->SharedArchiveRecoveryRequested = false;
> + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN;
> + SpinLockRelease(&XLogCtl->info_lck);
>
> This seems odd to me. In the first place, there doesn't seem to be any
> value in clearing this -- we're just expending extra CPU cycles to get
> rid of a value that wouldn't be used anyway. In the second place, if
> somehow someone checked the value after this point, with this code,
> they might get the wrong answer, whereas if you just deleted this,
> they would get the right answer.
>

Previously, this flag was only valid in the startup process. But now
it will be valid for all the processes and will stay until the whole
server gets restarted. I don't want anybody to use this flag after the
cleanup point and just be sure I am explicitly cleaning this.

By the way, I also don't expect we should go with this approach. I
proposed this by referring to the PromoteIsTriggered() implementation,
but IMO, it is better to have something else since we just want to perform
archive cleanup operation, and most of the work related to archive
recovery was done inside the StartupXLOG().

Rather than the proposed design, I was thinking of adding one or two
more RecoveryState enums. And while skipping XLogAcceptsWrite() set
XLogCtl->SharedRecoveryState appropriately, so that we can easily
identify that the archive recovery was requested previously and now,
we need to perform its pending cleanup operation. Thoughts?

> > In 0002 patch is a mixed one where I tried to remove the dependencies
> > on global variables and local variables belonging to StartupXLOG(). I
> > am still worried about the InRecovery value that needs to be deduced
> > afterward inside XLogAcceptWrites(). Currently, relying on
> > ControlFile->state != DB_SHUTDOWNED check but I think that will not be
> > good for ASRO where we plan to skip XLogAcceptWrites() work only and
> > let the StartupXLOG() do the rest of the work as it is where it will
> > going to update ControlFile's DBState to DB_IN_PRODUCTION, then we
> > might need some ugly kludge to call PerformRecoveryXLogAction() in
> > checkpointer irrespective of DBState, which makes me a bit
> > uncomfortable.
>
> I think that replacing the if (InRecovery) test with if
> (ControlFile->state != DB_SHUTDOWNED) is probably just plain wrong. I
> mean, there are three separate places where we set InRecovery = true.
> One of those executes if ControlFile->state != DB_SHUTDOWNED, matching
> what you have here, but it also can happen if checkPoint.redo <
> RecPtr, or if read_backup_label is true and ReadCheckpointRecord
> returns non-NULL. Now maybe you're going to tell me that in those
> other two cases we can't reach here anyway, but I don't see off-hand
> why that should be true, and even if it is true, it seems like kind of
> a fragile thing to rely on. I think we need to rely on something in
> shared memory that is more explicitly connected to the question of
> whether we are in recovery.
>

No, this is the other way. I haven't picked (ControlFile->state !=
DB_SHUTDOWNED) condition because it setting InRecovery, rather, I
picked because InRecovery flag is setting ControlFile->state to either
DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, see next if
(InRecovery) block after InRecovery flag gets set. It is certain that
when the system will be InRecovery, it will have the DBState other
than DB_SHUTDOWNED. But that isn't a clean approach for me because
when it will be in WAL prohibited the DBState will be DB_IN_PRODUCTION
which will not work, as I mentioned previously.

I am too thinking about passing this information via shared memory but
trying to somehow avoid this, lets see.

> The other part of this patch has to do with whether we can use the
> return value of GetLastSegSwitchData as a substitute for relying on
> EndOfLog. Now as you have it, you end up creating a local variable
> called EndOfLog that shadows another such variable in an outer scope,
> which probably would not make anyone who noticed things in such a
> state very happy. However, that will naturally get fixed if you
> reorder the patches as per above, so let's turn to the central
> question: is this a good way of getting EndOfLog? The value that would
> be in effect at the time this code is executed is set here:
>
> XLogBeginRead(xlogreader, LastRec);
> record = ReadRecord(xlogreader, PANIC, false);
> EndOfLog = EndRecPtr;
>
> Subsequently we do this:
>
> /* start the archive_timeout timer and LSN running */
> XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
> XLogCtl->lastSegSwitchLSN = EndOfLog;
>
> So at that point the value that GetLastSegSwitchData() would return
> has to match what's in the existing variable. But later XLogWrite()
> will change the value. So the question boils down to whether
> XLogWrite() could have been called between the assignment just above
> and when this code runs. And the answer seems to pretty clear be yes,
> because just above this code, we might have done
> CreateEndOfRecoveryRecord() or RequestCheckpoint(), and just above
> that, we did UpdateFullPageWrites(). So I don't think this is right.
>

You are correct, if XLogWrite() called between the lastSegSwitchLSN
value can be changed, but the question is, will that change in our
case. I think it won't, let me explain.

IIUC, lastSegSwitchLSN will change generally in XLogWrite(), if the
previous WAL has been filled up. But if we see closely what will be
going to be written before we do check lastSegSwitchLSN. Currently, we
have a record for full-page write and record for either recovery end
or checkpoint, all these are fixed size and I don't think going to
fill the whole 16MB wal file. Correct me if I am missing something.

> > > (3) CheckpointStats, which is called from RemoveXlogFile which is
> > > called from RemoveNonParentXlogFiles which is called from
> > > CleanupAfterArchiveRecovery which is called from XLogAcceptWrites.
> > > This last one is actually pretty weird already in the existing code.
> > > It sort of looks like RemoveXlogFile() only expects to be called from
> > > the checkpointer (or a standalone backend) so that it can update
> > > CheckpointStats and have that just work, but actually it's also called
> > > from the startup process when a timeline switch happens. I don't know
> > > whether the fact that the increments to ckpt_segs_recycled get lost in
> > > that case should be considered an intentional behavior that should be
> > > preserved or an inadvertent mistake. >
> >
> > Maybe I could be wrong, but I think that is intentional. It removes
> > pre-allocated or bogus files of the old timeline which are not
> > supposed to be considered in stats. The comments for
> > CheckpointStatsData might not be clear but comment at the calling
> > RemoveNonParentXlogFiles() place inside StartupXLOG hints the same:
>
> Sure, I'm not saying the files are being removed by accident. I'm
> saying it may be accidental that the removals are (I think) not going
> to make it into the stats.
>

Understood, it looks like I missed the concluding line in the previous
reply. My point was if deleting bogus files then why should we care
about counting them in stats.

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-09-24 11:42:53 Re: very long record lines in expanded psql output
Previous Message tushar 2021-09-24 11:28:06 Re: extensible options syntax for replication parser?