From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver |
Date: | 2017-03-17 15:54:03 |
Message-ID: | 5b6a6d6d-fb45-0afb-2e95-5600063c3dbd@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On 17/03/17 16:07, Tom Lane wrote:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>> I have confirmed on jacana Petr's observation that adding a timeout to
>> the WaitLatchOrSocket cures the problem.
>
> Does anyone have a theory as to why that cures the problem?
>
I now have theory and even PoC patch for it.
The long wait of WaitLatchOrSocket happens after connection state
switches from CONNECTION_STARTED to CONNECTION_MADE in both cases the
return value of PQconnectPoll is PGRES_POLLING_WRITING so we wait for
FD_WRITE.
Now the documentation for WSAEventSelect says "The FD_WRITE network
event is handled slightly differently. An FD_WRITE network event is
recorded when a socket is first connected with a call to the connect,
ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
when a socket is accepted with accept, AcceptEx, or WSAAccept function
and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
available. Therefore, an application can assume that sends are possible
starting from the first FD_WRITE network event setting and lasting until
a send returns WSAEWOULDBLOCK. After such a failure the application will
find out that sends are again possible when an FD_WRITE network event is
recorded and the associated event object is set."
But while PQconnectPoll does connect() before setting connection status
to CONNECTION_STARTED and returns PGRES_POLLING_WRITING so the FD_WRITE
eventually happens, it does not do any writes in the code block that
switches to CONNECTION_MADE and PGRES_POLLING_WRITING. That means
FD_WRITE event is never recorded as per quoted documentation.
Then what remains is why it works in libpq. If you look at
pgwin32_select which is eventually called by libpq code, it actually
handles the situation by trying empty WSASend on any socket that is
supposed to wait for FD_WRITE and only calling the
WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
the WSASend succeeds it immediately returns ok.
So I did something similar in attached for WaitEventSetWaitBlock() and
it indeed solves the issue my windows test machine. Now the
coding/placement probably isn't the best (and there are no comments),
but maybe somebody will find proper place for this now that we know the
cause.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
fix-wait-for-writable-socket-on-windows.patch | text/x-patch | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-03-17 16:04:05 | Re: pgsql: Rename "pg_clog" directory to "pg_xact". |
Previous Message | Tom Lane | 2017-03-17 15:07:46 | Re: [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver |
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2017-03-17 15:58:17 | Re: Adding the optional clause 'AS' in CREATE TRIGGER |
Previous Message | Osahon Oduware | 2017-03-17 15:43:52 | Re: QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions |