| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
| Subject: | Re: failures in t/031_recovery_conflict.pl on CI |
| Date: | 2022-05-08 17:59:09 |
| Message-ID: | 3447060.1652032749@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2022-05-08 11:28:34 -0400, Tom Lane wrote:
>> Per lapwing's latest results [1], this wasn't enough. I'm again thinking
>> we should pull the whole test from the back branches.
> That failure is different from the earlier failures though. I don't think it's
> a timing issue in the test like the deadlock check one. I rather suspect it's
> indicative of further problems in this area.
Yeah, that was my guess too.
> Potentially the known problem
> with RecoveryConflictInterrupt() running in the signal handler? I think Thomas
> has a patch for that...
Maybe; or given that it's on v10, it could be telling us about some
yet-other problem we perhaps solved since then without realizing
it needed to be back-patched.
> One failure in ~20 runs, on one animal doesn't seem worth disabling the test
> for.
No one is going to thank us for shipping a known-unstable test case.
It does nothing to fix the problem; all it will lead to is possible
failures during package builds. I have no idea whether any packagers
use "make check-world" rather than just "make check" while building.
But if they do, even fairly low-probability failures can be problematic.
(I still carry the scars I acquired while working at Red Hat and being
responsible for packaging mysql: at least back then, their test suite
was full of cases that mostly worked fine, except when getting stressed
in Red Hat's build farm. Dealing with a test suite that fails 50% of
the time under load, while trying to push out an urgent security fix,
is NOT a pleasant situation.)
I'm happy to have this test in the stable branches once we have committed
fixes that address all known problems. Until then, it will just be
a nuisance for anyone who is not a developer working on those problems.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2022-05-08 18:11:08 | Re: bogus: logical replication rows/cols combinations |
| Previous Message | Andres Freund | 2022-05-08 17:38:44 | Re: failures in t/031_recovery_conflict.pl on CI |