From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Assertion failure in WaitForWALToBecomeAvailable state machine |
Date: | 2022-09-13 04:10:36 |
Message-ID: | YyACvP++zgDphlcm@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 12, 2022 at 08:22:56PM -0700, Noah Misch wrote:
> I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
> the code location being changed. Today, in the unlikely event of
> !WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
> proceeds without waiting for WALRCV_STOPPED. I think WALRCV_STOPPING can't
> happen there. Even if WALRCV_WAITING also can't happen, it's a simplicity win
> to remove the need to think about that. Dilip Kumar and Nathan Bossart also
> stated support for that design.
I did not notice this one. Sounds pretty much right.
> If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
> perhaps should back-patch the change to released versions. Does anyone know
> whether one or both can happen?
Hmm. I agree that this could be a good thing in the long-term.
> The startup process maintains the invariant that archive recovery and
> streaming recovery don't overlap in time; one stops before the other starts.
> As long as it achieves that, I'm not aware of a cause to care whether the flag
> reset happens when streaming ends vs. when archive recovery begins. If the
> startup process develops a defect such that it ceases to maintain that
> invariant, "when archive recovery begins" does have the
> advantage-or-disadvantage of causing a failure in non-assert builds. The
> other way gets only an assertion failure. Michael Paquier or Bharath
> Rupireddy, did you envision benefits other than that one?
I think that you are forgetting one case here though: a standby where
standby.signal is set without restore_command and with
primary_conninfo. It could move on with recovery even if the WAL
receiver is unstable when something external pushes more WAL segments
to the standby's pg_wal/. So my point of upthread is that we should
make sure that standby.signal+primary_conninfo resets the flag state
when the WAL receiver stops streaming in this case as well.
The proposed v1-0001-Do-not-skip-calling-XLogShutdownWalRcv.patch
would achieve that, because it does not change the state of
InstallXLogFileSegmentActive depending on the source being
XLOG_FROM_ARCHIVE. So I'm fine with what you want to do.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2022-09-13 04:32:11 | Re: Background writer and checkpointer in crash recovery |
Previous Message | Zhang Mingli | 2022-09-13 03:35:37 | Removed unused param isSlice of function transformAssignmentSubscripts |