From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
Subject: | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date: | 2024-08-28 18:00:00 |
Message-ID: | 5ea25e4d-1ee2-b9bf-7806-119ffa658826@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Alvaro,
Let me show you another related anomaly, which drongo kindly discovered
recently: [1]. That test failed with:
SELECT dblink_cancel_query('dtest1');
- dblink_cancel_query
----------------------
- OK
+ dblink_cancel_query
+--------------------------
+ cancel request timed out
(1 row)
I've managed to reproduce this when running 20 dblink tests in parallel,
and with extra logging added (see attached) I've got:
...
2024-08-28 10:17:12.949 PDT [8236:204] pg_regress/dblink LOG: statement: SELECT dblink_cancel_query('dtest1');
!!!PQcancelPoll|8236| conn->status: 2
!!!PQcancelPoll|8236| conn->status: 3
!!!PQconnectPoll|8236| before pqPacketSend(..., &cancelpacket, ...)
!!!pqPacketSend|8236| before pqFlush
!!!pqsecure_raw_write|8236| could not send data to server: Socket is not connected (0x00002749/10057)
!!!pqPacketSend|8236| after pqFlush, STATUS_OK
!!!PQconnectPoll|8236| after pqPacketSend, STATUS_OK
2024-08-28 10:17:12.950 PDT [5548:7] pg_regress LOG: statement: select * from foo where f1 < 3
2024-08-28 10:17:12.951 PDT [8692:157] DEBUG: forked new backend, pid=4644 socket=5160
2024-08-28 10:17:12.973 PDT [4644:1] [unknown] LOG: connection received: host=::1 port=55073
2024-08-28 10:17:12.973 PDT [4644:2] [unknown] LOG: !!!BackendInitialize| before ProcessSSLStartup()
!!!PQcancelPoll|8236| conn->status: 4
!!!PQcancelPoll|8236| conn->status: 4
2024-08-28 10:17:24.060 PDT [1436:1] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/194C4E0 oldest xid 780
latest complete 779 next xid 780)
!!!PQcancelPoll|8236| conn->status: 4
2024-08-28 10:17:42.951 PDT [4644:3] [unknown] LOG: !!!BackendInitialize| ProcessSSLStartup() returned -1
2024-08-28 10:17:42.951 PDT [4644:4] [unknown] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
...
Thus, pqsecure_raw_write(), called via PQcancelPoll() -> PQconnectPoll() ->
pqPacketSend() -> pqFlush) -> pqSendSome() -> pqsecure_write(), returned
the WSAENOTCONN error, but it wasn't noticed at upper levels.
Consequently, the cancelling backend waited for the cancel packet that was
never sent.
The first commit, that I could reproduce this test failure on, is 2466d6654.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-26%2021%3A35%3A04
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
libpqsrv_cancel-debugging.patch | text/x-patch | 3.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-08-28 18:01:59 | Re: PATCH: Add hooks for pg_total_relation_size and pg_indexes_size |
Previous Message | Bruce Momjian | 2024-08-28 17:45:09 | Re: Detailed release notes |