Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-08-28 18:00:00
Message-ID: 5ea25e4d-1ee2-b9bf-7806-119ffa658826@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Alvaro,

Let me show you another related anomaly, which drongo kindly discovered
recently: [1]. That test failed with:
 SELECT dblink_cancel_query('dtest1');
- dblink_cancel_query
----------------------
- OK
+   dblink_cancel_query
+--------------------------
+ cancel request timed out
 (1 row)

I've managed to reproduce this when running 20 dblink tests in parallel,
and with extra logging added (see attached) I've got:
...
2024-08-28 10:17:12.949 PDT [8236:204] pg_regress/dblink LOG: statement: SELECT dblink_cancel_query('dtest1');
!!!PQcancelPoll|8236| conn->status: 2
!!!PQcancelPoll|8236| conn->status: 3
!!!PQconnectPoll|8236| before pqPacketSend(..., &cancelpacket, ...)
!!!pqPacketSend|8236| before pqFlush
!!!pqsecure_raw_write|8236| could not send data to server: Socket is not connected (0x00002749/10057)
!!!pqPacketSend|8236| after pqFlush, STATUS_OK
!!!PQconnectPoll|8236| after pqPacketSend, STATUS_OK
2024-08-28 10:17:12.950 PDT [5548:7] pg_regress LOG:  statement: select * from foo where f1 < 3
2024-08-28 10:17:12.951 PDT [8692:157] DEBUG:  forked new backend, pid=4644 socket=5160
2024-08-28 10:17:12.973 PDT [4644:1] [unknown] LOG:  connection received: host=::1 port=55073
2024-08-28 10:17:12.973 PDT [4644:2] [unknown] LOG: !!!BackendInitialize| before ProcessSSLStartup()
!!!PQcancelPoll|8236| conn->status: 4
!!!PQcancelPoll|8236| conn->status: 4
2024-08-28 10:17:24.060 PDT [1436:1] DEBUG:  snapshot of 0+0 running transaction ids (lsn 0/194C4E0 oldest xid 780
latest complete 779 next xid 780)
!!!PQcancelPoll|8236| conn->status: 4
2024-08-28 10:17:42.951 PDT [4644:3] [unknown] LOG: !!!BackendInitialize| ProcessSSLStartup() returned -1
2024-08-28 10:17:42.951 PDT [4644:4] [unknown] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
...

Thus, pqsecure_raw_write(), called via PQcancelPoll() -> PQconnectPoll() ->
pqPacketSend() -> pqFlush) -> pqSendSome() -> pqsecure_write(), returned
the WSAENOTCONN error, but it wasn't noticed at upper levels.
Consequently, the cancelling backend waited for the cancel packet that was
never sent.

The first commit, that I could reproduce this test failure on, is 2466d6654.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-26%2021%3A35%3A04

Best regards,
Alexander

Attachment Content-Type Size
libpqsrv_cancel-debugging.patch text/x-patch 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-08-28 18:01:59 Re: PATCH: Add hooks for pg_total_relation_size and pg_indexes_size
Previous Message Bruce Momjian 2024-08-28 17:45:09 Re: Detailed release notes