Re: Add isolation test template in injection_points for wait/wakeup/detach

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add isolation test template in injection_points for wait/wakeup/detach
Date: 2025-02-07 04:41:20
Message-ID: Z6WO8FbqK_FHmrzC@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 05, 2025 at 11:03:25AM +0000, Bertrand Drouvot wrote:
> I think that makes sense and the patch LGTM.
> A few tests are already using this technique (including injection_points in
> inplace.spec).

Well, that's not the end of the story, I have brushed the CF bot for
some activity and noticed this one:
https://cirrus-ci.com/task/6565511640645632

That's kind of rather hard to reach, it seems, but I also got it in
one of my own runs:
--- /tmp/cirrus-ci-build/src/test/modules/injection_points/expected/basic.out 2025-02-06 23:53:40.838077000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/injection_points/isolation/results/basic.out 2025-02-06 23:56:21.848507000 +0000
@@ -13,13 +13,14 @@

(1 row)

+step detach2: SELECT injection_points_detach('injection-points-wait'); <waiting ...>
step wait1: <... completed>
injection_points_run
--------------------

(1 row)

-step detach2: SELECT injection_points_detach('injection-points-wait');
+step detach2: <... completed>
injection_points_detach
-----------------------

This is telling us that the detach step could be seen as waiting by
the isolation tester before the wait phase reports for completion. I
didn't think it would be possible to get that, but well, we do.

A marker like detach2(wait1) is not enough to cover that, as this
ensures the order of the step completion output. Using detach2(*)
which would cause the detach2 step to show as <waiting> immediately is
not good either, as the wait could always complete between the
detach's <waiting> and <completed>.

There is an stronger trick mentioned at the end of the README that
should be able to solve this new problem as well as the original one:
an empty step between the wait and the detach. If we do that, the
detach will never be launched until the wait has fully completed,
bringing a stronger ordering of the events: we should never see the
detach as waiting like in this new problem, now would we see the first
problem where the wait would report its result after the detach.

I have done a total of 10 runs in the CI with the attached, without
getting a failure. HEAD was failing a bit more easily than that, with
at least one failure every 5 runs in my branches. Will go adjust that
in a bit as per the attached.
--
Michael

Attachment Content-Type Size
0001-Extra-tweak-for-injection-test-permutation.patch text/x-diff 2.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-02-07 05:30:39 Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Previous Message David G. Johnston 2025-02-07 02:42:47 Re: Document NULL