Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-02-13 01:53:10
Message-ID: 20220213015310.rjipagnmt2x7mmqs@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-12 11:47:20 -0500, Tom Lane wrote:
> Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> > 11.02.2022 05:22, Andres Freund wrote:
> >> Over in another thread I made some wild unsubstantiated guesses that the
> >> windows issues could have been made much more likely by a somewhat odd bit of
> >> code in PQisBusy():
> >> https://postgr.es/m/1959196.1644544971%40sss.pgh.pa.us
> >> Alexander, any chance you'd try if that changes the likelihood of the problem
> >> occurring, without any other fixes / reverts applied?
>
> > Unfortunately I haven't seen an improvement for the test in question.

Thanks for testing!

> Yeah, that's what I expected, sadly. While I think this PQisBusy behavior
> is definitely a bug, it will not lead to an infinite loop, just to write
> failures being reported in a less convenient fashion than intended.

FWIW, I didn't think it'd end up looping indefinitely, but that there's a
chance it could end up waiting indefinitely. The WaitLatchOrSocket() doesn't
have a timeout, and if I understand the windows FD_CLOSE stuff correctly,
you're not guaranteed to get an event if you do WaitForMultipleObjects if
FD_CLOSE was already consumed and if there isn't any data to read.

ISTM that it's not a great idea for libpqrcv_receive() to do blocking IO at
all. The caller expects it to not block...

> I wonder whether it would help to put a PQconsumeInput call *before*
> the PQisBusy loop, so that any pre-existing EOF condition will be
> detected. If you don't like duplicating code, we could restructure
> the loop as

That does look a bit saner. Even leaving EOF and windows issues aside, it
seems weird to do a WaitLatchOrSocket() without having tried to read more
data.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-13 02:00:44 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Andres Freund 2022-02-13 01:20:08 Re: Adding CI to our tree