Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-07-17 19:00:00
Message-ID: f92ce13f-cbad-769d-72df-f5b87717f375@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Thomas,

17.07.2024 03:05, Thomas Munro wrote:
> On Wed, Jul 17, 2024 at 3:08 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> Ugh. I tried to follow what's going on in that cygwin code, but I gave
>> up pretty quickly. It depends on a mutex, but I didn't see the mutex
>> being defined or initialized anywhere.
> https://github.com/cygwin/cygwin/blob/cygwin-3.5.3/winsup/cygwin/fhandler/socket_inet.cc#L217C1-L217C77
>
> Not obvious how it'd be deadlocking (?), though... it's hard to see
> how anything between LOCK_EVENTS and UNLOCK_EVENTS could escape/return
> early. (Something weird going on with signal handlers? I can't
> imagine where one would call poll() though).

I've simplified the repro to the following:
echo "
-- setup foreign server "loopback" --

CREATE TABLE t1(i int);
CREATE FOREIGN TABLE ft1 (i int) SERVER loopback OPTIONS (table_name 't1');
CREATE FOREIGN TABLE ft2 (i int) SERVER loopback OPTIONS (table_name 't1');

INSERT INTO t1 SELECT i FROM generate_series(1, 100000) g(i);
" | psql

cat << 'EOF' | psql
Select pg_sleep(10);
SET statement_timeout = '10ms';
SELECT 'SELECT count(*) FROM ft1 CROSS JOIN ft2;' FROM generate_series(1, 100)
\gexec
EOF

I've attached strace (with --mask=0x251, per [1]) to the query-cancelling
backend and got strace.log (see in attachment), while observing:
ERROR:  canceling statement due to statement timeout
...
ERROR:  canceling statement due to statement timeout
-- total 14 lines, then the process hanged --
-- I interrupted it several seconds later --

As far as I can see (having analyzed a number of runs), the hanging occurs
when some itimer-related activity happens before "peek_socket" in this
event sequence:
[main] postgres {pid} select_stuff::wait: res after verify 0
[main] postgres {pid} select_stuff::wait: returning 0
[main] postgres {pid} select: sel.wait returns 0
[main] postgres {pid} peek_socket: read_ready: 0, write_ready: 1, except_ready: 0

(See the last occurrence of the sequence in the log.)

[1] https://cygwin.com/cygwin-ug-net/strace.html

Best regards,
Alexander

Attachment Content-Type Size
strace.log text/x-log 348.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-17 19:05:03 Re: improve performance of pg_dump with many sequences
Previous Message Tom Lane 2024-07-17 18:59:26 Re: improve performance of pg_dump with many sequences