From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Assertion failure in WaitForWALToBecomeAvailable state machine |
Date: | 2022-02-11 10:02:45 |
Message-ID: | CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming(). But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED. So now
WalRcvStreaming() will return false. So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?
I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.
2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/00000002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)
I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.
@@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
*/
if (WalRcvStreaming())
XLogShutdownWalRcv();
+ else
+ {
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+ }
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2022-02-11 10:37:17 | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? |
Previous Message | Nikolay Shaplov | 2022-02-11 09:51:22 | [PATCH] minor reloption regression tests improvement |