From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: BUG: possible busy loop when connection is closed |
Date: | 2004-09-23 08:12:48 |
Message-ID: | 1095927167.3552.11.camel@fuji.krosing.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On N, 2004-09-23 at 06:41, Tom Lane wrote:
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > We were bitten by the following bug a few times, when our server tried
> > to reestablish connections under bad network conditions:
> >
> > if connection is closed while trying to get response to SSL setup packet
> > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as
> > line 1035 in 8.0.0.beta2:
> >
> > if (pqWaitTimed(1, 0, conn, finish_time) {
> >
> > tells that there is data to read (returns 0) while actually it is error
> > (POLLERR & POLLHUP) and not POLLIN returned from poll() and
at least on linux it does, we got the following trace:
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
which seems to say that poll came back on POLLHUP, and as there is just
one fd, it must mean that this one fd is closed. But this may be
non-portable
> This is intentional: the idea is that we should go ahead and do the read
> (or write), which will detect the error condition on the socket. poll()
> in itself doesn't give enough information to determine what the error
> condition is, so it's not appropriate to fail here.
>
> > after that the check on line 1462:
> >
> > if (nread == 0)
> > /* caller failed to wait for data */
> > return PGRES_POLLING_READING;
> >
> > resumes the busy loop
>
> This seems to me to be the bug. pqReadData jumps through hoops to
> determine whether a zero-length read means EOF or not, and I think we
> need to expend some effort to determine that here too.
>
> One possibility is to forget the direct call to recv() and use
> pqReadData --- since conn->ssl isn't set yet, and we aren't expecting
> the server to send more than one byte, this should in theory be safe.
I was scared by the comment before recv(...,1,0) which said we must be
careful not to read more than 1 byte
Is it impossible to not accidentally get more than one and screw up SSL
handshake ?
-------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Oliver Jowett | 2004-09-23 08:17:46 | Re: SQL-Invoked Procedures for 8.1 |
Previous Message | Magnus Hagander | 2004-09-23 07:57:55 | Re: SQL-Invoked Procedures for 8.1 |