Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-08-30 19:21:25
Message-ID: 578934.1725045685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> Let me show you another related anomaly, which drongo kindly discovered
> recently: [1]. That test failed with:
> SELECT dblink_cancel_query('dtest1');
> - dblink_cancel_query
> ----------------------
> - OK
> + dblink_cancel_query
> +--------------------------
> + cancel request timed out
> (1 row)

While we're piling on, has anyone noticed that *non* Windows buildfarm
animals are also failing this test pretty frequently? The most recent
occurrence is at [1], and it looks like this:

diff -U3 /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out
--- /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out 2024-07-22 11:09:50.638133878 +0000
+++ /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out 2024-08-30 06:28:01.971083945 +0000
@@ -17,4 +17,5 @@
SET LOCAL statement_timeout = '10ms';
select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
ERROR: canceling statement due to statement timeout
+WARNING: could not get result of cancel request due to timeout
COMMIT;

I trawled the buildfarm database for other occurrences of "could not
get result of cancel request" since this test went in. I found 34
of them (see attachment), and none that weren't the timeout flavor.

Most of the failing machines are not especially slow, so even though
the hard-wired 30 second timeout that's being used here feels a little
under-engineered, I'm not sure that arranging to raise it would help.
My spidey sense feels that there's some actual bug here, but it's hard
to say where. mylodon's postmaster log confirms that the 30 seconds
did elapse, and that there wasn't anything much else going on:

2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel ERROR: canceling statement due to statement timeout
2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel STATEMENT: select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5;
2024-08-30 06:28:01.946 UTC client backend[3668381] pg_regress/query_cancel WARNING: could not get result of cancel request due to timeout

Any thoughts?

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2006%3A25%3A46

Attachment Content-Type Size
cancel-request-failures.txt text/plain 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-30 19:42:20 Re: pgsql: Implement pg_wal_replay_wait() stored procedure
Previous Message Jacob Champion 2024-08-30 18:35:53 Re: PG_TEST_EXTRA and meson