From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Petr Jelinek <petr(at)2ndquadrant(dot)com>, Shay Rojansky <roji(at)roji(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Some 9.5beta2 backend processes not terminating properly? |
Date: | 2016-01-02 13:26:47 |
Message-ID: | 20160102132647.mlwrv7dvtc3qzki5@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016-01-02 18:40:38 +0530, Amit Kapila wrote:
> What I wanted to say is that the handling of socket closure is not
> same in WaitLatchOrSocket() and pgwin32_waitforsinglesocket()
> due to which this problem can arise and it seems that is the
> right line of direction to pursue. I have found that
> in WaitLatchOrSocket(),
> even when the socket is closed, we remember the result as
> WL_SOCKET_READABLE and again tries to wait whereas the
> same is handled properly in pgwin32_waitforsinglesocket().
That's actually intentional, and part of the design:
* When waiting on a socket, EOF and error conditions are reported by
* returning the socket as readable/writable or both, depending on
* WL_SOCKET_READABLE/WL_SOCKET_WRITEABLE being specified.
The way this is supposed to work, and does on unixoid systems, is that
WaitLatchOS returns, the recv is retried and signals an error.
> If we
> remember the closed socket event and then take appropriate action,
> then this problem won't happen. Attached patch which by no-means
> a complete fix shows what I wanted to say and after this the problem
> mentioned by Shay doesn't happen, although I get LOG message
> which is due to the reason that proper handling for socket closure
> needs to be done in this path. This patch is based on the code
> after commit 387da18874afa17156ee3af63766f17efb53c4b9. I
> will do testing and refine the fix based on HEAD later as I am done
> for the today.
It's weird that this fixes the problem. As we were previously, according
to Shay, not busy looping, this seems to indicate that FD_CLOSE is only
reported once or somesuch?
It'd be very interesting to add a debug elog() into the
if (resEvents.lNetworkEvents & FD_CLOSE)
{
if (wakeEvents & WL_SOCKET_READABLE)
result |= WL_SOCKET_READABLE;
if (wakeEvents & WL_SOCKET_WRITEABLE)
result |= WL_SOCKET_WRITEABLE;
}
path in WaitLatchOrSocket. If it actually returns with the current code,
we have a better idea where to look for problems.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Fabrízio de Royes Mello | 2016-01-02 14:02:02 | Re: Patch: fix lock contention for HASHHDR.mutex |
Previous Message | Amit Kapila | 2016-01-02 13:10:38 | Re: Some 9.5beta2 backend processes not terminating properly? |