From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Possible crash on standby |
Date: | 2022-09-09 08:29:49 |
Message-ID: | 20220909.172949.2223165886970819060.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
While I played with some patch, I met an assertion failure.
#2 0x0000000000b350e0 in ExceptionalCondition (
conditionName=0xbd8970 "!IsInstallXLogFileSegmentActive()",
errorType=0xbd6e11 "FailedAssertion", fileName=0xbd6f28 "xlogrecovery.c",
lineNumber=4190) at assert.c:69
#3 0x0000000000586f9c in XLogFileRead (segno=61, emode=13, tli=1,
source=XLOG_FROM_ARCHIVE, notfoundOk=true) at xlogrecovery.c:4190
#4 0x00000000005871d2 in XLogFileReadAnyTLI (segno=61, emode=13,
source=XLOG_FROM_ANY) at xlogrecovery.c:4296
#5 0x000000000058656f in WaitForWALToBecomeAvailable (RecPtr=1023410360,
randAccess=false, fetching_ckpt=false, tliRecPtr=1023410336, replayTLI=1,
replayLSN=1023410336, nonblocking=false) at xlogrecovery.c:3727
This is replayable by the following steps.
1. insert a sleep(1) in WaitForWALToBecomeAvailable().
> * WAL that we restore from archive.
> */
> + sleep(1);
> if (WalRcvStreaming())
> XLogShutdownWalRcv();
2. create a primary with archiving enabled.
3. create a standby with recovering from the primary's archive and
unconnectable primary_conninfo.
4. start the primary.
5. switch wal on the primary.
6. Kaboom.
This is because WaitForWALToBecomeAvailable doesn't call
XLogSHutdownWalRcv() when walreceiver has been stopped before we reach
the WalRcvStreaming() call cited above. But we need to set
InstasllXLogFileSegmentActive to false even in that case, since no one
other than startup process does that.
Unconditionally calling XLogShutdownWalRcv() fixes it. I feel we might
need to correct the dependencies between the flag and walreceiver
state, but it not mandatory because XLogShutdownWalRcv() is designed
so that it can be called even after walreceiver is stopped. I don't
have a clear memory about why we do that at the time, though, but
recovery check runs successfully with this.
This code was introduced at PG12.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Do-not-skip-calling-XLogShutdownWalRcv.patch | text/x-patch | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2022-09-09 09:00:36 | Re: proposal: possibility to read dumped table's name from file |
Previous Message | John Naylor | 2022-09-09 07:53:31 | Re: proposal: possibility to read dumped table's name from file |