Re: Infinite loop in XLogPageRead() on standby

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com
Subject: Re: Infinite loop in XLogPageRead() on standby
Date: 2025-01-16 00:42:49
Message-ID: Z4hWCcAioBwhKJU-@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2025 at 10:35:42AM +0100, Alexander Kukushkin wrote:
> Thank you for picking it up. I briefly looked at both patches. The actual
> fix in XLogPageRead() looks good to me.
> I also agree with suggested refactoring, where there is certainly some room
> for improvement - $WAL_SEGMENT_SIZE, $WAL_BLOCK_SIZE variables and
> get_int_setting(), start_of_page() funcs are still duplicated in both test
> files.
> Maybe we can have something like the following in Cluster.pm? It'll allow
> further simplify tests and reduce code duplication.

Yeah, I was looking at that, but disliked a bit this option compared
to the routines doing WAL manipulations which are more complex with
their own ways of getting close to page limits, because I feel that we
should be smarter with the interfaces of these routines with more
options. For example, we have other things scanning control file
data like system_identifier in 040_pg_createsubscriber.p, so we could
have a SQL that wraps around the pg_control_* functions with a custom
field, like we do for "sub lsn" in Cluster.pm. Same comment for the
pg_settings queries, which are not limited to only what we do here.
The wrapper around integers is useful, of course, but we could make a
refactored routine apply a cast after checking its unit internally,
or something like that.

The routines for the start page position would not be fit within
Cluster.pm as it does not depend on a $node. Perhaps Utils.pm? Here
again, living with this small duplication felt OK in the scope of this
bug fix. I'm of course open to tuning all that, though my primary
goal is to wrap the fix :D

I've applied the first refactoring bits down to v13 (see for example a
s/emit_message/emit_wal/ tweaked for consistency, with more comment
tweaks). Attached are patches for each branch for the bug fix, that
I'm still testing and looking at more. The readability of
043_no_contrecord_switch.pl looks rather OK here.
--
Michael

Attachment Content-Type Size
v2-0002-Fix-incorrect-header-check-for-continuatio-master.patch text/x-diff 8.4 KB
v2-0002-Fix-incorrect-header-check-for-continuation-W-v17.patch text/x-diff 8.4 KB
v2-0002-Fix-incorrect-header-check-for-continuation-W-v16.patch text/x-diff 8.3 KB
v2-0002-Fix-incorrect-header-check-for-continuation-W-v15.patch text/x-diff 8.0 KB
v2-0002-Fix-incorrect-header-check-for-continuation-W-v14.patch text/x-diff 8.6 KB
v2-0002-Fix-incorrect-header-check-for-continuation-W-v13.patch text/x-diff 7.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-01-16 00:43:28 Re: Skip collecting decoded changes of already-aborted transactions
Previous Message David Rowley 2025-01-16 00:42:09 Re: [PATCH] Hex-coding optimizations using SVE on ARM.