From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add isolation test template in injection_points for wait/wakeup/detach |
Date: | 2025-02-07 04:41:20 |
Message-ID: | Z6WO8FbqK_FHmrzC@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 05, 2025 at 11:03:25AM +0000, Bertrand Drouvot wrote:
> I think that makes sense and the patch LGTM.
> A few tests are already using this technique (including injection_points in
> inplace.spec).
Well, that's not the end of the story, I have brushed the CF bot for
some activity and noticed this one:
https://cirrus-ci.com/task/6565511640645632
That's kind of rather hard to reach, it seems, but I also got it in
one of my own runs:
--- /tmp/cirrus-ci-build/src/test/modules/injection_points/expected/basic.out 2025-02-06 23:53:40.838077000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/injection_points/isolation/results/basic.out 2025-02-06 23:56:21.848507000 +0000
@@ -13,13 +13,14 @@
(1 row)
+step detach2: SELECT injection_points_detach('injection-points-wait'); <waiting ...>
step wait1: <... completed>
injection_points_run
--------------------
(1 row)
-step detach2: SELECT injection_points_detach('injection-points-wait');
+step detach2: <... completed>
injection_points_detach
-----------------------
This is telling us that the detach step could be seen as waiting by
the isolation tester before the wait phase reports for completion. I
didn't think it would be possible to get that, but well, we do.
A marker like detach2(wait1) is not enough to cover that, as this
ensures the order of the step completion output. Using detach2(*)
which would cause the detach2 step to show as <waiting> immediately is
not good either, as the wait could always complete between the
detach's <waiting> and <completed>.
There is an stronger trick mentioned at the end of the README that
should be able to solve this new problem as well as the original one:
an empty step between the wait and the detach. If we do that, the
detach will never be launched until the wait has fully completed,
bringing a stronger ordering of the events: we should never see the
detach as waiting like in this new problem, now would we see the first
problem where the wait would report its result after the detach.
I have done a total of 10 runs in the CI with the attached, without
getting a failure. HEAD was failing a bit more easily than that, with
at least one failure every 5 runs in my branches. Will go adjust that
in a bit as per the attached.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Extra-tweak-for-injection-test-permutation.patch | text/x-diff | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-02-07 05:30:39 | Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary |
Previous Message | David G. Johnston | 2025-02-07 02:42:47 | Re: Document NULL |