Quick Links

Re: Fix 035_standby_logical_decoding.pl race conditions

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Fix 035_standby_logical_decoding.pl race conditions
Date:	2025-03-27 10:20:30
Message-ID:	CAA4eK1+bC-Qx7nrxxVUp8efKF5b549bfEKm3QiGnjmOY5nKLqQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 26, 2025 at 1:17 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > Seeing all these failures, I wonder whether we can reliably test
> > active slots apart from wal_level change test (aka Scenario 6:
> > incorrect wal_level on primary.). Sure, we can try by having some
> > injection point kind of tests, but is it really worth because, anyway
> > the active slots won't get invalidated in the scenarios for row
> > removal we are testing in this case. The other possibility is to add a
> > developer-level debug_disable_running_xact GUC to test this and
> > similar cases, or can't we have an injection point to control logging
> > this WAL record? I have seen the need to control logging running_xact
> > record in other cases as well.
>
> Based on the idea which controls generating RUNNING_XACTS, I prototyped a patch.
> When the instance is attached the new injection point, all processes would skip
> logging the record. This does not need to extend injection_point module.
>

Right, I think this is a better idea. I have a few comments:
1.
+ /*
+ * In 035_standby_logical_decoding.pl, RUNNING_XACTS could move slots's
+ * xmin forward and cause random failures.

No need to use test file name in code comments.

2. The comments atop wait_until_vacuum_can_remove can be changed to
indicate that we will avoid logging running_xact with the help of
injection points.

3.
+ # Note that from this point the checkpointer and bgwriter will wait before
+ # they write RUNNING_XACT record.
+ $node_primary->safe_psql('testdb',
+ "SELECT injection_points_attach('log-running-xacts', 'wait');");

Isn't it better to use 'error' as the second parameter as we don't
want to wait at this injection point?

4.
+ # XXX If the instance does not attach 'log-running-xacts', the bgwriter
+ # pocess would generate RUNNING_XACTS record, so that the test would fail.
+ sleep(20);

I think it is better to make a separate patch (as a first patch) for
this so that it can be used as a reproducer. I suggest to use
checkpoint as used by one of Bertrand's patches to ensure that the
issue reproduces in every environment.

> Sadly IS_INJECTION_POINT_ATTACHED() was introduced for PG18 so that the patch
> could not backport for PG17 as-is.
>

We can use 'wait' mode API in PG17 as used in one of the tests
(injection_points_attach('heap_update-before-pin', 'wait');) but I
think it may be better to just leave testing active slots in
backbranches because anyway the new development happens on HEAD and we
want to ensure that no breakage happens there.

--
With Regards,
Amit Kapila.

In response to

RE: Fix 035_standby_logical_decoding.pl race conditions at 2025-03-26 07:47:36 from Hayato Kuroda (Fujitsu)

Responses

RE: Fix 035_standby_logical_decoding.pl race conditions at 2025-03-28 09:02:29 from Hayato Kuroda (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrea Gelmini	2025-03-27 10:25:25	Re: FileFallocate misbehaving on XFS
Previous Message	Peter Eisentraut	2025-03-27 10:16:07	Re: Thread-safe nl_langinfo() and localeconv()