From: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date: | 2024-03-13 10:04:43 |
Message-ID: | CAGECzQRvmUK5-d68A+cm+fgmfht9Dv2uZ28-qq3QiaF6EAZqPQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 13 Mar 2024 at 04:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I suspect it's basically just a
> timing dependency. Have you thought about the fact that a cancel
> request is a no-op if it arrives after the query's done?
I agree it's probably a timing issue. The cancel being received after
the query is done seems very unlikely, since the query takes 180
seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these
animals). I think it's more likely that the cancel request arrives too
early, and thus being ignored because no query is running yet. The
test already had logic to wait until the query backend was in the
"active" state, before sending a cancel to solve that issue. But my
guess is that that somehow isn't enough.
Sadly I'm having a hard time reliably reproducing this race condition
locally. So it's hard to be sure what is happening here. Attached is a
patch with a wild guess as to what the issue might be (i.e. seeing an
outdated "active" state and thus passing the check even though the
query is not running yet)
Attachment | Content-Type | Size |
---|---|---|
v37-0001-Hopefully-make-cancel-test-more-reliable.patch | application/octet-stream | 4.2 KB |
v37-0002-Start-using-new-libpq-cancel-APIs.patch | application/octet-stream | 10.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2024-03-13 10:18:33 | Re: Vectored I/O in bulk_write.c |
Previous Message | shveta malik | 2024-03-13 09:15:14 | Re: Introduce XID age and inactive timeout based replication slot invalidation |