From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
Subject: | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date: | 2024-10-01 12:00:00 |
Message-ID: | c68225b4-fce9-3425-1534-a21a815d5846@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Tom and Jelte,
31.08.2024 07:04, Alexander Lakhin wrote:
> I've tested your fix with the modification I proposed upthread:
> idle_session_timeout_enabled = false;
> }
> +if (rand() % 10 == 0) pg_usleep(10000);
> /*
> * (5) disable async signal conditions again.
>
> and can confirm that the issue is gone. On 8749d850f~1, the test failed
> on iterations 3, 3. 12 for me, but on current REL_17_STABLE, 100 test
> iterations succeeded.
>
One month later, treehopper has found a way to break that test: [1].
The failure log contains:
2024-09-30 19:34:31.347 UTC [3201034:13] fdw_retry_check LOG: execute <unnamed>: DECLARE c2 CURSOR FOR
SELECT count(*) FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) INNER JOIN "S 1"."T 1" r4 ON (TRUE))
INNER JOIN "S 1"."T 1" r6 ON (TRUE))
2024-09-30 19:34:31.464 UTC [3201033:10] pg_regress/query_cancel ERROR: canceling statement due to statement timeout
2024-09-30 19:34:31.464 UTC [3201033:11] pg_regress/query_cancel STATEMENT: SELECT count(*) FROM ft1 a CROSS JOIN ft1 b
CROSS JOIN ft1 c CROSS JOIN ft1 d;
2024-09-30 19:34:31.466 UTC [3201035:1] [unknown] LOG: connection received: host=[local]
2024-09-30 19:34:31.474 UTC [3201034:14] fdw_retry_check LOG: statement: FETCH 100 FROM c2
2024-09-30 19:35:01.485 UTC [3201033:12] pg_regress/query_cancel WARNING: could not get result of cancel request due to
timeout
It looks like this time the cancel request arrived to the remote backend
when it processed FETCH, presumably at the DoingCommandRead stage.
I've managed to reproduce the issue with the additional modification:
@@ -1605,7 +1605,10 @@ postgresIterateForeignScan(ForeignScanState *node)
* first call after Begin or ReScan.
*/
if (!fsstate->cursor_exists)
+{
create_cursor(node);
+pg_usleep(100000);
+}
With postgres_fdw/Makefile modified to repeat the query_cancel test, I get:
ok 13 - query_cancel 245 ms
not ok 14 - query_cancel 30258 ms
ok 15 - query_cancel 249 ms
...
ok 19 - query_cancel 236 ms
not ok 20 - query_cancel 30258 ms
ok 21 - query_cancel 225 ms
..
not ok 33 - query_cancel 30272 ms
1..33
# 3 of 33 tests failed.
(Please find attached the complete patch.)
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=treehopper&dt=2024-09-30%2019%3A21%3A14
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
reproduce-postgres_fdw-query_cancel-failure.pathch | text/plain | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-10-01 12:04:07 | Re: On non-Windows, hard depend on uselocale(3) |
Previous Message | Jim Jones | 2024-10-01 11:46:44 | Re: Truncate logs by max_log_size |