From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, thomas(dot)munro(at)gmail(dot)com, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date: | 2023-08-11 02:58:08 |
Message-ID: | 20230811025808.GA2261232@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote:
> On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote:
> > On my system, testing unpatched REL_15_2, the test reached its first failure
> > at minute 53 and its second failure at minute 189. Perhaps a faster,
> > deterministic test would be feasible:
>
> It took less than five minutes for me to reproduce the failure using
> the test case sent by Alexander on HEAD:
> 2023-08-10 15:26:37.401 JST [11239] FATAL: invalid memory alloc request size 2021163525
> 2023-08-10 15:26:37.405 JST [11235] LOG: startup process (PID 11239) exited with exit code 1
>
> I don't see it after two hours of tests with the patch.
That is good.
> > - Use pg_logical_emit_message to fill a few segments with 0xFF.
> > - CHECKPOINT the primary, so the standby recycles segments.
> > - One more pg_logical_emit_message, computing the length from
> > pg_current_wal_insert_lsn() such that new message crosses a segment boundary
> > and ends 4 bytes before the end of a page.
> > - Stop the primary.
> > - If the bug is present, the standby will exit.
>
> Good idea to pollute the data with recycled segments. Using a minimal
> WAL segment size whould help here as well in keeping a test cheap, and
> two segments should be enough. The alignment calculations and the
> header size can be known, but the standby records are an issue for the
> predictability of the test when it comes to adjust the length of the
> logical message depending on the 8k WAL page, no?
Could be. I expect there would be challenges translating that outline into a
real test, but I don't know if that will be a major one. The test doesn't
need to be 100% deterministic. If it fails 25% of the time and is not the
slowest test in the recovery suite, I'd find that good enough.
> >>>> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
> >>>
> >>> Regarding [1], is it still worth zeroing recycled pages on standbys and/or
> >>> reading the whole header before allocating xl_tot_len? (Are there benefits
> >>> other than avoiding a 1G backend allocation or 4G frontend allocation, or is
> >>> that benefit worth the cycles?)
> >>
> >> I believe reading the whole header is the most sensible approach as it
> >> can prevent unnecessary memory requests. Another potential solution
> >> (or hack) for this specific case is to let XLogWalRcvFlush write a
> >> finalizing ((uint32)0) when dying is true. This stabilizes the
> >> behavior to "invalid record length.. got 0" when running the TAP test.
>
> FWIW, I came back to this thread while tweaking the error reporting of
> xlogreader.c for the sake of this thread and this proposal is an
> improvement to be able to make a distinction between an OOM and an
> incorrect record:
> https://www.postgresql.org/message-id/ZMh/WV+CuknqePQQ(at)paquier(dot)xyz
>
> Anyway, agreed that it's an improvement to remove this check out of
> allocate_recordbuf(). Noah, are you planning to work more on that?
I can push xl_tot_len-validate-v1.patch, particularly given the testing result
you reported today. I'm content for my part to stop there.
> >> --- a/src/backend/replication/walreceiver.c
> >> +++ b/src/backend/replication/walreceiver.c
> >> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
> >> {
> >> Assert(tli != 0);
> >>
> >> + if (dying)
> >> + {
> >> + int buflen = sizeof(uint32);
> >> + char buf[sizeof(uint32)];
> >> +
> >> + memset(buf, 0, buflen);
> >> + XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli);
> >> + }
> >> +
> >> if (LogstreamResult.Flush < LogstreamResult.Write)
> >> {
> >> WalRcvData *walrcv = WalRcv;
> >
> > That's inexpensive in cycles, and it should make the failures in question
> > quite rare. If we do that instead of pre-zeroing, I think we'd still want to
> > be ready for invalid lengths, for the case of unclean standby shutdown.
>
> Tweaking the WAL receiver to do more zeroing has been discussed in the
> past, but as far as I recall we were not completely sure whether it's
> completely free, either. It seems to me that this should have a
> discussion on its own, in a new thread. Not sure if we should worry
> about archiving, actually, even if the segments should be padded with
> zeros beforehand.
Okay.
> + gotheader = targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord ? true : false;
Okay.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-08-11 06:08:26 | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Previous Message | David G. Johnston | 2023-08-11 02:42:19 | Re: BUG #18040: PostgreSQL does not report its version correctly |