deadlock-hard flakiness

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: deadlock-hard flakiness
Date: 2023-02-08 01:10:21
Message-ID: 20230208011021.winlfnypdbzpr3ic@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On cfbot / CI, we've recently seen a lot of spurious test failures due to
src/test/isolation/specs/deadlock-hard.spec changing output. Always on
freebsd, when running tests against a pre-existing instance.

I'm fairly sure I've seen this failure on the buildfarm as well, but I'm too
impatient to wait for the buildfarm database query (it really should be
updated to use lz4 toast compression).

Example failures:

1)
https://cirrus-ci.com/task/5307793230528512?logs=test_running#L211
https://api.cirrus-ci.com/v1/artifact/task/5307793230528512/testrun/build/testrun/isolation-running/isolation/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/5307793230528512/testrun/build/testrun/runningcheck.log

2)
https://cirrus-ci.com/task/6137098198056960?logs=test_running#L212
https://api.cirrus-ci.com/v1/artifact/task/6137098198056960/testrun/build/testrun/isolation-running/isolation/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/6137098198056960/testrun/build/testrun/runningcheck.log

So far the diff always is:

diff -U3 /tmp/cirrus-ci-build/src/test/isolation/expected/deadlock-hard.out /tmp/cirrus-ci-build/build/testrun/isolation-running/isolation/results/deadlock-hard.out
--- /tmp/cirrus-ci-build/src/test/isolation/expected/deadlock-hard.out 2023-02-07 05:32:34.536429000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/isolation-running/isolation/results/deadlock-hard.out 2023-02-07 05:40:33.833908000 +0000
@@ -25,10 +25,11 @@
step s6a7: <... completed>
step s6c: COMMIT;
step s5a6: <... completed>
-step s5c: COMMIT;
+step s5c: COMMIT; <waiting ...>
step s4a5: <... completed>
step s4c: COMMIT;
step s3a4: <... completed>
+step s5c: <... completed>
step s3c: COMMIT;
step s2a3: <... completed>
step s2c: COMMIT;

Commit 741d7f1047f fixed a similar issue in deadlock-hard. But it looks like
we need something more. But perhaps this isn't an output ordering issue:

How can we end up with s5c getting reported as waiting? I don't see how s5c
could end up blocking on anything?

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-08 01:28:52 windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Previous Message Stephen Frost 2023-02-08 01:02:05 Re: RLS makes COPY TO process child tables