From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: backtrace_on_internal_error |
Date: | 2023-12-08 19:33:16 |
Message-ID: | 20231208193316.5ylgs4zb6zngwyg4@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-12-08 10:51:01 -0800, Andres Freund wrote:
> On 2023-12-08 13:46:07 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > On 2023-12-08 13:23:50 -0500, Tom Lane wrote:
> > >> Hmm, don't suppose you have a way to reproduce that?
> >
> > > After a bit of trying, yes. I put an abort() into pgtls_open_client(), after
> > > initialize_SSL(). Connecting does result in:
> > > LOG: could not accept SSL connection: Success
> >
> > OK. I can dig into that, unless you're already on it?
>
> I think I figured it it out. Looks like we need to translate a closed socket
> (recvfrom() returning 0) to ECONNRESET or such.
I think we might just need to expand the existing branch for EOF:
if (r < 0)
ereport(COMMERROR,
(errcode_for_socket_access(),
errmsg("could not accept SSL connection: %m")));
else
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("could not accept SSL connection: EOF detected")));
The openssl docs say:
The following return values can occur:
0
The TLS/SSL handshake was not successful but was shut down controlled and by the specifications of the TLS/SSL protocol. Call SSL_get_error() with the return value ret to find out the reason.
1
The TLS/SSL handshake was successfully completed, a TLS/SSL connection has been established.
<0
The TLS/SSL handshake was not successful because a fatal error occurred either at the protocol level or a connection failure occurred. The shutdown was not clean. It can also occur if action is needed to continue the operation for nonblocking BIOs. Call SSL_get_error() with the return value ret to find out the reason.
Which fits with my reproducer - due to the abort the connection was *not* shut
down via SSL in a controlled manner, therefore r < 0.
Hm, oddly enough, there's this tidbit in the SSL_get_error() manpage:
On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL,
nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the
returned error is SSL_ERROR_SSL with a meaningful error on the error stack.
But I reproduced this with 3.1.
Seems like we should just treat errno == 0 as a reason to emit the "EOF
detected" message?
I wonder if we should treat send/recv returning 0 different from an error
message perspective during an established connection. Right now we produce
could not receive data from client: Connection reset by peer
because be_tls_read() sets errno to ECONNRESET - despite that not having been
returned by the OS. But I guess that's a topic for another day.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Verite | 2023-12-08 19:45:23 | Re: Emitting JSON to file using COPY TO |
Previous Message | Andres Freund | 2023-12-08 18:51:01 | Re: backtrace_on_internal_error |