From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date: | 2023-09-02 01:29:39 |
Message-ID: | ZPKQA+HHqLVRFBSs@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Aug 21, 2023 at 08:32:39AM +0900, Michael Paquier wrote:
> I am not sure that I'll be able to do more on this topic this week, at
> least that's some progress.
Some time later..
I have spent more time and thoughts on the whole test suite, finishing
with the attached 0003 that applies on top of your own patches. I am
really looking forward to making this whole logic more robust, so as
WAL replay can be made itself more robust for the OOM/end-of-wal
detection on HEAD for standbys and crash recovery.
While looking at the whole, I have considered a few things that may
make the test cleaner, like:
- Calculating the segment name and its offset from the end_lsn of a
record directly from the backend, but it felt inelegant to pass
through more subroutine layers the couple ($segment, offset) rather
than just a LSN, so guessing the segment number and the offset while
the cluster is offline if OK by me.
- The TLI can be queried from the server rather than hardcoded.
- I've been thinking about bundling the tests of each sub-section in
their own subroutine, but that felt a bit awkward, particularly for
the part where we need a correct $prev_lsn in the record header
written to enforce other code paths.
- The test needs better documentation. One of the things I kept
staring at is cross-checking pack() and the dependency to the C
structures, so I have added more details there, explaining more the
whys and the hows.
I have also looked again at the C code for a few hours, and still got
the impression that this is rather solid. There are two things that
may be better:
- Document at the top of allocate_recordbuf() that this should never
be called with a length coming from a header until it is validated.
- Removing AllocSizeIsValid() for the non-FRONTEND path should be OK.
What do you think?
--
Michael
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Add-TAP-tests-for-end-of-WAL-conditions.patch | text/x-diff | 12.8 KB |
v3-0002-Don-t-trust-unvalidated-xl_tot_len.patch | text/x-diff | 7.8 KB |
v3-0003-Adjust-few-things-on-top-of-Thomas-patch-series.patch | text/x-diff | 22.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Sergei Kornilov | 2023-09-02 11:51:48 | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Previous Message | Michael Paquier | 2023-09-02 00:08:22 | Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon |