Re: Recent 027_streaming_regress.pl hangs

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recent 027_streaming_regress.pl hangs
Date: 2024-03-21 02:50:24
Message-ID: 20240321025024.ohozgkijorpp3ejx@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-03-20 17:41:47 -0700, Andres Freund wrote:
> There's a lot of other animals on the same machine, however it's rarely fuly
> loaded (with either CPU or IO).
>
> I don't think the test just being slow is the issue here, e.g. in the last
> failing iteration
>
> [...]
>
> I suspect we have some more fundamental instability at our hands, there have
> been failures like this going back a while, and on various machines.

I'm somewhat confused by the timestamps in the log:

[22:07:50.263](223.929s) ok 2 - regression tests pass
...
[22:14:02.051](371.788s) # poll_query_until timed out executing this query:

I read this as 371.788s having passed between the messages. Which of course is
much higher than PostgreSQL::Test::Utils::timeout_default=180

Ah.

The way that poll_query_until() implements timeouts seems decidedly
suboptimal. If a psql invocation, including query processing, takes any
appreciateble amount of time, poll_query_until() waits much longer than it
shoulds, because it very naively determines a number of waits ahead of time:

my $max_attempts = 10 * $PostgreSQL::Test::Utils::timeout_default;
my $attempts = 0;

while ($attempts < $max_attempts)
{
...

# Wait 0.1 second before retrying.
usleep(100_000);

$attempts++;
}

Ick.

What's worse is that if the query takes too long, the timeout afaict never
takes effect.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-03-21 02:54:38 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message vignesh C 2024-03-21 02:48:29 Re: Have pg_basebackup write "dbname" in "primary_conninfo"?