From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
Subject: | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date: | 2024-08-30 19:21:25 |
Message-ID: | 578934.1725045685@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> Let me show you another related anomaly, which drongo kindly discovered
> recently: [1]. That test failed with:
> SELECT dblink_cancel_query('dtest1');
> - dblink_cancel_query
> ----------------------
> - OK
> + dblink_cancel_query
> +--------------------------
> + cancel request timed out
> (1 row)
While we're piling on, has anyone noticed that *non* Windows buildfarm
animals are also failing this test pretty frequently? The most recent
occurrence is at [1], and it looks like this:
diff -U3 /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out
--- /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out 2024-07-22 11:09:50.638133878 +0000
+++ /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out 2024-08-30 06:28:01.971083945 +0000
@@ -17,4 +17,5 @@
SET LOCAL statement_timeout = '10ms';
select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
ERROR: canceling statement due to statement timeout
+WARNING: could not get result of cancel request due to timeout
COMMIT;
I trawled the buildfarm database for other occurrences of "could not
get result of cancel request" since this test went in. I found 34
of them (see attachment), and none that weren't the timeout flavor.
Most of the failing machines are not especially slow, so even though
the hard-wired 30 second timeout that's being used here feels a little
under-engineered, I'm not sure that arranging to raise it would help.
My spidey sense feels that there's some actual bug here, but it's hard
to say where. mylodon's postmaster log confirms that the 30 seconds
did elapse, and that there wasn't anything much else going on:
2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel ERROR: canceling statement due to statement timeout
2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel STATEMENT: select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5;
2024-08-30 06:28:01.946 UTC client backend[3668381] pg_regress/query_cancel WARNING: could not get result of cancel request due to timeout
Any thoughts?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2006%3A25%3A46
Attachment | Content-Type | Size |
---|---|---|
cancel-request-failures.txt | text/plain | 6.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-08-30 19:42:20 | Re: pgsql: Implement pg_wal_replay_wait() stored procedure |
Previous Message | Jacob Champion | 2024-08-30 18:35:53 | Re: PG_TEST_EXTRA and meson |