Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-10-01 12:00:00
Message-ID: c68225b4-fce9-3425-1534-a21a815d5846@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tom and Jelte,

31.08.2024 07:04, Alexander Lakhin wrote:
> I've tested your fix with the modification I proposed upthread:
>              idle_session_timeout_enabled = false;
>          }
> +if (rand() % 10 == 0) pg_usleep(10000);
>          /*
>           * (5) disable async signal conditions again.
>
> and can confirm that the issue is gone. On 8749d850f~1, the test failed
> on iterations 3, 3. 12 for me, but on current REL_17_STABLE, 100 test
> iterations succeeded.
>

One month later, treehopper has found a way to break that test: [1].
The failure log contains:
2024-09-30 19:34:31.347 UTC [3201034:13] fdw_retry_check LOG: execute <unnamed>: DECLARE c2 CURSOR FOR
    SELECT count(*) FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) INNER JOIN "S 1"."T 1" r4 ON (TRUE))
INNER JOIN "S 1"."T 1" r6 ON (TRUE))
2024-09-30 19:34:31.464 UTC [3201033:10] pg_regress/query_cancel ERROR:  canceling statement due to statement timeout
2024-09-30 19:34:31.464 UTC [3201033:11] pg_regress/query_cancel STATEMENT:  SELECT count(*) FROM ft1 a CROSS JOIN ft1 b
CROSS JOIN ft1 c CROSS JOIN ft1 d;
2024-09-30 19:34:31.466 UTC [3201035:1] [unknown] LOG:  connection received: host=[local]
2024-09-30 19:34:31.474 UTC [3201034:14] fdw_retry_check LOG: statement: FETCH 100 FROM c2
2024-09-30 19:35:01.485 UTC [3201033:12] pg_regress/query_cancel WARNING:  could not get result of cancel request due to
timeout

It looks like this time the cancel request arrived to the remote backend
when it processed FETCH, presumably at the DoingCommandRead stage.
I've managed to reproduce the issue with the additional modification:
@@ -1605,7 +1605,10 @@ postgresIterateForeignScan(ForeignScanState *node)
      * first call after Begin or ReScan.
      */
     if (!fsstate->cursor_exists)
+{
         create_cursor(node);
+pg_usleep(100000);
+}

With postgres_fdw/Makefile modified to repeat the query_cancel test, I get:
ok 13        - query_cancel                              245 ms
not ok 14    - query_cancel                            30258 ms
ok 15        - query_cancel                              249 ms
...
ok 19        - query_cancel                              236 ms
not ok 20    - query_cancel                            30258 ms
ok 21        - query_cancel                              225 ms
..
not ok 33    - query_cancel                            30272 ms
1..33
# 3 of 33 tests failed.

(Please find attached the complete patch.)

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=treehopper&dt=2024-09-30%2019%3A21%3A14

Best regards,
Alexander

Attachment Content-Type Size
reproduce-postgres_fdw-query_cancel-failure.pathch text/plain 1.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-10-01 12:04:07 Re: On non-Windows, hard depend on uselocale(3)
Previous Message Jim Jones 2024-10-01 11:46:44 Re: Truncate logs by max_log_size