From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, "Augustine, Jobin" <jobin(dot)augustine(at)openscg(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Replication to Postgres 10 on Windows is broken |
Date: | 2017-08-06 17:14:36 |
Message-ID: | 20170806171436.ve646fu4bpagdrc2@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hi,
On 2017-08-06 12:29:07 -0400, Tom Lane wrote:
> Yeah. After some digging around I think I see exactly what is happening.
> The error message would be better read as "Socket is not connected *yet*",
> that is, the problem is that we're trying to write data before the
> nonblocking connection request has completed. (This fits with the OP's
> observation that local loopback connections work fine --- they probably
> complete immediately.) PQconnectPoll believes that it just has to wait
> for write-ready when waiting for a connection to complete. When using
> connectDBComplete's wait loop, that reduces to a call to Windows' version
> of select(2), in pqSocketPoll, and according to
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms740141(v=vs.85).aspx
>
> "The parameter writefds identifies the sockets that are to be checked for
> writability. If a socket is processing a connect call (nonblocking), a
> socket is writeable if the connection establishment successfully
> completes."
>
> On the other hand, in libpqwalreceiver, we're depending on latch.c's
> implementation, and it uses WSAEventSelect's FD_WRITE event:
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
>
> If I'm reading that correctly, FD_WRITE is set instantly by the connect
> request, probably even in the nonblock case, and it only gets cleared
> by a failed write request. It looks to me like we would have to
> specifically look for FD_CONNECT, *not* FD_WRITE, to make this work.
Nice digging.
> This is problematic, because the APIs in between don't provide a way
> to report that we're still waiting for connect rather than for
> data-write-ready. Anybody have the stomach for extending PQconnectPoll's
> API with an extra PGRES_POLLING_CONNECTING state?
I'm a bit hesitant to do so at this phase of the release cycle, it'd
kind of force all users to upgrade their code, and I'm sure there's a
couple out-of-tree ones. And not just code explicitly using new versions
of libpq, also users of old versions - several distributions just
install newer libpq versions and rely on it being compatible.
> If not, can we tell in
> WaitEventAdjustWin32 that the socket is still connecting and we must
> substitute FD_CONNECT for FD_WRITE?
I was wondering, for a second, if we should just always use FD_CONNECT
once in every set. But unfortunately there's plenty places that
create/destroy sets at a high enough speed for that to not be a nice
solution.
A third solution would be to, for v10, add a #ifdef WIN32 block to
libpqrcv_connect() that just waits till FD_CONNECT is ready. That has
the disadvantage of not accepting interrupts, but still seems better
than not working at all. That's not much of a real solution, but this
late in the cycle it might be advisable to hold our noses :(
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-08-06 17:34:42 | Re: Replication to Postgres 10 on Windows is broken |
Previous Message | Tom Lane | 2017-08-06 16:29:07 | Re: Replication to Postgres 10 on Windows is broken |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-08-06 17:34:42 | Re: Replication to Postgres 10 on Windows is broken |
Previous Message | Tom Lane | 2017-08-06 16:29:07 | Re: Replication to Postgres 10 on Windows is broken |