From: | "Tristan Partin" <tristan(at)neon(dot)tech> |
---|---|
To: | "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>, "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Subject: | Re: psql not responding to SIGINT upon db reconnection |
Date: | 2023-11-29 17:48:51 |
Message-ID: | CXBHH2O58HVA.JUP07AES8JVK@neon.tech |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu Nov 23, 2023 at 3:19 AM CST, Heikki Linnakangas wrote:
> On 22/11/2023 23:29, Tristan Partin wrote:
> > Ha, you're right. I had this working yesterday, but convinced myself it
> > didn't. I had a do while loop wrapping the blocking call. Here is a v4,
> > which seems to pass the tests that were pointed out to be failing
> > earlier.
>
> Thanks! This suffers from a classic race condition:
>
> > + if (cancel_pressed)
> > + break;
> > +
> > + sock = PQsocket(n_conn);
> > + if (sock == -1)
> > + break;
> > +
> > + rc = pqSocketPoll(sock, for_read, !for_read, (time_t)-1);
> > + Assert(rc != 0); /* Timeout is impossible. */
> > + if (rc == -1)
> > + {
> > + success = false;
> > + break;
> > + }
>
> If a signal arrives between the "if (cancel_pressed)" check and
> pqSocketPoll(), pgSocketPoll will "miss" the signal and block
> indefinitely. There are three solutions to this:
>
> 1. Use a timeout, so that you wake up periodically to check for any
> missed signals. Easy solution, the downsides are that you will not react
> as quickly if the signal is missed, and you waste some cycles by waking
> up unnecessarily.
>
> 2. The Self-pipe trick: https://cr.yp.to/docs/selfpipe.html. We also use
> that in src/backend/storage/ipc/latch.c. It's portable, but somewhat
> complicated.
>
> 3. Use pselect() or ppoll(). They were created specifically to address
> this problem. Not fully portable, I think it's missing on Windows at least.
>
> Maybe use pselect() if it's available, and select() with a timeout if
> it's not.
First I have ever heard of this race condition, and now I will commit it
to durable storage :). I went ahead and did option #3 that you proposed.
On Windows, I have a 1 second timeout for the select/poll. That seemed
like a good balance of user experience and spinning the CPU. But I am
open to other values. I don't have a Windows VM, but maybe I should set
one up...
I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.
For what it's worth, I did try #2, but since psql installs its own
SIGINT handler, the code became really hairy.
--
Tristan Partin
Neon (https://neon.tech)
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patch | text/x-patch | 8.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-11-29 17:52:01 | Re: Fix assertion in autovacuum worker |
Previous Message | Andres Freund | 2023-11-29 17:42:35 | Re: remaining sql/json patches |