Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-08-12 21:13:27
Message-ID: 20230812211327.GB2326466@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 11, 2023 at 08:32:00PM -0400, Tom Lane wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > On Sat, Aug 12, 2023 at 11:56:45AM +1200, Thomas Munro wrote:
> >> On Sat, Aug 12, 2023 at 2:00 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> >>> I recall earlier messages theorizing that it was just harder to hit in v14, so
> >>> I'm disinclined to stop at v15. I think the main choice is whether to stop at
> >>> v11 (normal choice) or v12 (worry about breaking the last v11 point release).
> >>> I don't have a strong opinion between those.
>
> > Okay. I wouldn't be inclined to patch v11 for that, FWIW, as this
> > code path is touched by recovery and more. At least it does not seem
> > worth taking any risk compared to the potential gain.
>
> That was my gut reaction too -- risk/reward seems not great for the
> last v11 release. (I've been burned a couple of times now by putting
> can't-fix-it-anymore bugs into the final release of a branch, so maybe
> I'm just being overly paranoid. But it's something to worry about.)

Okay, v12+ would be fine with me. Alas, a closer look at v1 found two
defects, one with severity far greater than the bug v1 aimed to fix:

===== 1. For oversized records, it changed startup FATAL to silent data loss

This affects only supported branches (v15-). In v16+, commit 8fcb32d
introduced and checked XLogRecordMaxSize. But in v15 with patch v1, redo of
pg_logical_emit_message(false, '_', repeat('x', 1024 * 1024 * 1024 - 1000))
just ends at the oversized record:

2523581 2023-08-12 17:17:39.512 GMT LOG: redo starts at 0/41F155B8
2523581 2023-08-12 17:17:39.512 GMT LOG: record length 1073740879 at 0/41F155F0 too long
2523581 2023-08-12 17:17:39.512 GMT LOG: redo done at 0/41F155B8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

Any user could call pg_logical_emit_message() to silently terminate the WAL
stream, which is far worse than the original bug. So far, I'm seeing one way
to clearly fix $SUBJECT without that harm. When a record header spans a page
boundary, read the next page to reassemble the header. If
!ValidXLogRecordHeader() (invalid xl_rmid or xl_prev), treat as end of WAL.
Otherwise, read the whole record in chunks, calculating CRC. If CRC is
invalid, treat as end of WAL. Otherwise, ereport(FATAL) for the oversized
record. A side benefit would be avoiding useless large allocations (1GB
backend, 4GB frontend) as discussed upthread. May as well do the xl_rmid and
xl_prev checks in all branches, to avoid needless XLogRecordMaxSize-1
allocations. Thoughts? For invalid-length records in v16+, since every such
record is end-of-wal or corruption, those versions could skip the CRC.

As an alternative, zeroing recycled pages could remove "most of" the spurious
errors without adding silent data loss. I considered another alternative of
back-patching XLogRecordMaxSize and proceeding with a patch like v1, but that
wouldn't help with records in existing WAL archives. Of course, we could also
choose to leave $SUBJECT unfixed in v15-.

===== 2. DecodeXLogRecordRequiredSpace() is the wrong ceiling

v1 treats DecodeXLogRecordRequiredSpace(L) (=L+2445 here) as the max needed,
but allocate_recordbuf(state, L) allocates up to L+8192. If the uninitialized
length happened to contain an integer between about 1<<30-8192 and 1<<30-2445,
one still got e.g. "FATAL: invalid memory alloc request size 1073741824".
Since v16+ has XLogRecordMaxSize, testing XLogRecordMaxSize avoids the problem
and should suffice:

===
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -640,7 +640,6 @@ restart:
(uint32) SizeOfXLogRecord, total_len);
goto err;
}
-#ifndef FRONTEND

/*
* We may be looking at a random uint32 read from a recycled segment. For
@@ -651,13 +650,12 @@ restart:
* the allocation may succeed but record checks are going to fail so this
* would be short-lived.
*/
- if (!AllocSizeIsValid(DecodeXLogRecordRequiredSpace(total_len)))
+ if (total_len > XLogRecordMaxSize)
{
report_invalid_record(state, "record length %u at %X/%X too long",
total_len, LSN_FORMAT_ARGS(RecPtr));
goto err;
}
-#endif

/*
* If the whole record header is on this page, validate it immediately.
===

allocate_recordbuf()'s addend is not critical; it's just to "avoid useless
small increases". v15- should avoid the problem like this:

===
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 969bcc3..1bac303 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -201,4 +201,8 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ);
newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ));
+#ifndef FRONTEND
+ if (!AllocSizeIsValid(newSize))
+ newSize = reclength;
+#endif

if (state->readRecordBuf)
===

In v15, one can reach this case with pg_logical_emit_message(false, '_',
repeat('x', 1024 * 1024 * 1024 - 4000)). On v16+, one might reach this with
unlucky trailing garbage, but XLogRecordMaxSize prevents reaching it through
pg_logical_emit_message().

Thanks,
nm

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-08-13 03:12:34 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Andres Freund 2023-08-12 00:48:12 Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon