Re: needless complexity in StartupXLOG

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: needless complexity in StartupXLOG
Date: 2021-07-26 17:32:31
Message-ID: 20210726173231.GU20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> So at the moment I am leaning toward the view that we should just
> remove this check entirely, as in the attached, proposed patch.

Haven't dug in deeply but at least following your explanation and
reading over the patch and the code a bit, I tend to agree.

> Really, I think we should consider going further. If it's safe to
> write an end-of-recovery record rather than a checkpoint, why not do
> so all the time? Right now we fail to do that in the above-described
> "impossible" scenario where the previous checkpoint record can't be
> read, or if we're exiting archive recovery for some reason other than
> a promotion request, or if we're in single-user mode, or if we're in
> crash recovery. Presumably, people would like to start up the server
> quickly in all of those scenarios, so the only reason not to use this
> technology all the time is if we think it's safe in some scenarios and
> not others. I can't think of a reason why it matters why we're exiting
> archive recovery, nor can I think of a reason why it matters whether
> we're in single user mode. The distinction between crash recovery and
> archive recovery does seem to matter, but if anything the crash
> recovery scenario seems simpler, because then there's only one
> timeline involved.

Yeah, tend to agree with this too ... but something I find a bit curious
is the comment:

* Insert a special WAL record to mark the end of
* recovery, since we aren't doing a checkpoint.

... immediately after setting promoted = true, and then at the end of
StartupXLOG() having:

if (promoted)
RequestCheckpoint(CHECKPOINT_FORCE);

maybe I'm missing something, but seems like that comment isn't being
terribly clear. Perhaps we aren't doing a full checkpoint *there*, but
sure looks like we're going to do one moments later regardless of
anything else since we've set promoted to true..?

> I realize that conservatism may have played a role in this code ending
> up looking the way that it does; someone seems to have thought it
> would be better not to rely on a new idea in all cases. From my point
> of view, though, it's scary to have so many cases, especially cases
> that don't seem like they should ever be reached. I think that
> simplifying the logic here and trying to do the same things in as many
> cases as we can will lead to better robustness. Imagine if instead of
> all the hairy logic we have now we just replaced this whole if
> (IsInRecovery) stanza with this:
>
> if (InRecovery)
> CreateEndOfRecoveryRecord();
>
> That would be WAY easier to reason about than the rat's nest we have
> here today. Now, I am not sure what it would take to get there, but I
> think that is the direction we ought to be heading.

Agreed that simpler logic is better, provided it's correct logic, of
course. Finding better ways to test all of this would be really nice.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-07-26 17:33:03 Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep
Previous Message Alvaro Herrera 2021-07-26 17:30:42 Re: Skip temporary table schema name from explain-verbose output.