Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, thomas(dot)munro(at)gmail(dot)com, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-08-10 23:22:48
Message-ID: ZNVxSMWZdNOXN9sH@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote:
> On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote:
>> - Use pg_logical_emit_message to fill a few segments with 0xFF.
>> - CHECKPOINT the primary, so the standby recycles segments.
>> - One more pg_logical_emit_message, computing the length from
>> pg_current_wal_insert_lsn() such that new message crosses a segment boundary
>> and ends 4 bytes before the end of a page.
>> - Stop the primary.
>> - If the bug is present, the standby will exit.
>
> Good idea to pollute the data with recycled segments. Using a minimal
> WAL segment size whould help here as well in keeping a test cheap, and
> two segments should be enough. The alignment calculations and the
> header size can be known, but the standby records are an issue for the
> predictability of the test when it comes to adjust the length of the
> logical message depending on the 8k WAL page, no?

Actually, for this one, I think that I have a simpler idea to make it
deterministic. Once we have inserted a record at the page limit on
the primary, we can:
- Stop the standby
- Stop the primary
- Rewrite by ourselves a few bytes in the last segment on the standby
to emulate a recycled segment portion, based on the end LSN of the
logical message record, retrieved either with pg_walinspect or
pg_waldump.
- Start the standby, which would replay up to the previous record at
the page limit.
- The standby just be in a state where it waits for the missing
records from the primary and keeps looking at streaming, but it should
not fail startup.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2023-08-11 02:02:36 Re: BUG #18040: PostgreSQL does not report its version correctly
Previous Message PG Bug reporting form 2023-08-10 12:31:04 BUG #18053: fastpath count per pid in pg_locks shows > 16 entries