| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: walsender vs. XLogBackgroundFlush during shutdown | 
| Date: | 2019-05-01 15:53:15 | 
| Message-ID: | 20190501155315.kme7zjesq3rdtkfu@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote:
> The reason for that is pretty simple - the walsenders are doing logical
> decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
> But per XLogSendLogical() this only happens if
> 
>    if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
>    {
>        WalSndCaughtUp = true;
>        ...
>    }
> 
> That is, we need to get beyong GetFlushRecPtr(). But that may never
> happen, because there may be incomplete (and unflushed) page in WAL
> buffers, not forced out by any transaction. So if there's a WAL record
> overflowing to the unflushed page, the walsender will never catch up.
> 
> Now, this situation is apparently expected, because WalSndWaitForWal()
> does this:
> 
>    /*
>     * If we're shutting down, trigger pending WAL to be written out,
>     * otherwise we'd possibly end up waiting for WAL that never gets
>     * written, because walwriter has shut down already.
>     */
>    if (got_STOPPING)
>        XLogBackgroundFlush();
> 
> but unfortunately that does not actually do anything, because right at
> the very beginning XLogBackgroundFlush() does this:
> 
>    /* back off to last completed page boundary */
>    WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
> That is, it intentionally ignores the incomplete page, which means the
> walsender can't read the record and reach GetFlushRecPtr().
> 
> XLogBackgroundFlush() does this since (at least) 2007, so how come we
> never had issues with this before?
I assume that's because of the following logic:
	/* if we have already flushed that far, consider async commit records */
	if (WriteRqst.Write <= LogwrtResult.Flush)
	{
		SpinLockAcquire(&XLogCtl->info_lck);
		WriteRqst.Write = XLogCtl->asyncXactLSN;
		SpinLockRelease(&XLogCtl->info_lck);
		flexible = false;		/* ensure it all gets written */
	}
and various pieces of the code doing XLogSetAsyncXactLSN() to force
flushing.  I wonder if the issue is that we're better at avoiding
unnecessary WAL to be written due to
6ef2eba3f57f17960b7cd4958e18aa79e357de2f
> But I don't think we're safe without the failover slots patch, because
> any output plugin can do something similar - say, LogLogicalMessage() or
> something like that. I'm not aware of a plugin doing such things, but I
> don't think it's illegal / prohibited either. (Of course, plugins that
> generate WAL won't be useful for decoding on standby in the future.)
FWIW, I'd consider such an output plugin outright broken.
> So what I think we should do is to tweak XLogBackgroundFlush() so that
> during shutdown it skips the back-off to page boundary, and flushes even
> the last piece of WAL. There are only two callers anyway, so something
> like XLogBackgroundFlush(bool) would be simple enough.
I think it then just ought to be a normal XLogFlush(). I.e. something
along the lines of:
		/*
		 * If we're shutting down, trigger pending WAL to be written out,
		 * otherwise we'd possibly end up waiting for WAL that never gets
		 * written, because walwriter has shut down already.
		 */
		if (got_STOPPING && !RecoveryInProgress())
			XLogFlush(GetXLogInsertRecPtr());
		/* Update our idea of the currently flushed position. */
		if (!RecoveryInProgress())
			RecentFlushPtr = GetFlushRecPtr();
		else
			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2019-05-01 15:59:12 | Re: hyrax vs. RelationBuildPartitionDesc | 
| Previous Message | Andres Freund | 2019-05-01 15:36:37 | Re: New vacuum option to do only freezing |