Re: Rare SSL failures on eelpout

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: Rare SSL failures on eelpout
Date: 2019-03-04 23:13:59
Message-ID: CA+hUKGK=EDuwNiTwrUJ456z+JRxLnj-_dSFk=2XNKObLuu-Fvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2019 at 11:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > OK, here's something. I can reproduce it quite easily on this
> > machine, and I can fix it like this:
>
> > libpq_gettext("could not send startup packet: %s\n"),
> > SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
> > free(startpacket);
> > + pqHandleSendFailure(conn);
> > goto error_return;
> > }
>
> Yeah. But I don't like pqHandleSendFailure all that much: it has strong
> constraints on what state libpq has to be in, as a consequence of which
> it's called from a bunch of ad-hoc places, and can't be called from
> some others. It's kind of accidental that it will work here.
>
> I was toying with the idea of getting rid of that function in
> favor of a design like this:
>
> * Handle any socket-write failure at some low level of libpq by
> recording the fact that the error occurred (plus whatever data we
> need to produce a message about it), and then starting to just
> discard output data.
>
> * Eventually, we'll try to read. Process any available input data
> as usual (and, if we get a read error, report that). When no more
> input data is available, if a socket write failure has been recorded,
> report that much as if it were an incoming ERROR message, and then
> shut down the connection.
>
> This would essentially give us pqHandleSendFailure-like behavior
> across the board, which might be enough to fix this problem as well as
> bug #15598. Or not ... as you say, we haven't thoroughly understood
> either issue, so it's possible this wouldn't do it.
>
> This would have the effect of prioritizing reports of socket read
> failures over those of write failures, which is an interesting
> behavioral change, but offhand I can't see a problem with it.

That seems to recreate (and extend) the behaviour that we see on
another TCP stacks, or on Linux with a remote connection, namely that
the first sendto() succeeds (even though future sendto() calls may
fail, we don't usually get to that because we read and discover an
application-level error message or whatever).

> One thing that isn't real clear to me is how much timing sensitivity
> there is in "when no more input data is available". Can we assume that
> if we've gotten ECONNRESET or an allied error from a write, then any
> data the far end might've wished to send us is already available to
> read? In principle, since TCP allows shutting down either transmission
> direction independently, the server could close the read side of its
> socket while continuing to transmit data. In practice, PG servers
> don't do that; but could network timing glitches create the same effect?
> Even if it's possible, does it happen enough to worry about?

That is beyond my knowledge of networking; I have CC'd Andrew Gierth
in case he has something to say about that. It certainly seems
sensible to assume that eg RST must follow any data that the other end
sent before closing, if it did indeed call close(), and in our case we
know that it never calls shutdown(SHUT_RD), so the only other
possibility seems to be that it crashed or the connection was lost.
So it seems safe to assume that we can read the server's parting words
after we've had a EPIPE/ECONNRESET error. Based on some quick
googling, it looks like SHUT_RD doesn't actually send anything anyway
(unlike SHUT_WR which sends FIN and would result in EOF at the other
end), so I'm not sure if SHUT_RD even "exists" outside the mind of the
TCP stack doing it and therefore I'm not sure if there is any "timing
glitch" that could resemble it. But I don't know.

> The reason I'm concerned is that I don't think it'd be bright to ignore a
> send error until we see input EOF, which'd be the obvious way to solve a
> timing problem if there is one. If our send error was some transient
> glitch and the connection is really still open, then we might not get EOF,
> but we won't get a server reply either because no message went to the
> server. You could imagine waiting some small interval of time before
> deciding that it's time to report the write failure, but ugh.
>
> In any case, consuming immediately-available data before reporting the
> write error would already be a step forward over what we're doing now,
> I think.
>
> Thoughts?

It seems reasonable to me on the grounds that sending data doesn't
mean the other guy got it anyway and we can see that on other TCP
stacks. Preferring errors resulting from reads seems sensible because
it's exactly what we want in these cases, and I can't immediately
think of a downside. It's bigger surgery than I was thinking of but
it seems like it removes some pre-existing complicated code and
replaces it with something simple, so that seems to be a win.

--
Thomas Munro
https://enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2019-03-04 23:27:10 Re: New vacuum option to do only freezing
Previous Message Paul Ramsey 2019-03-04 22:59:29 Re: Allowing extensions to supply operator-/function-specific info