| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: backtrace_on_internal_error | 
| Date: | 2023-12-08 19:33:16 | 
| Message-ID: | 20231208193316.5ylgs4zb6zngwyg4@awork3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2023-12-08 10:51:01 -0800, Andres Freund wrote:
> On 2023-12-08 13:46:07 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > On 2023-12-08 13:23:50 -0500, Tom Lane wrote:
> > >> Hmm, don't suppose you have a way to reproduce that?
> >
> > > After a bit of trying, yes.  I put an abort() into pgtls_open_client(), after
> > > initialize_SSL(). Connecting does result in:
> > > LOG:  could not accept SSL connection: Success
> >
> > OK.  I can dig into that, unless you're already on it?
>
> I think I figured it it out. Looks like we need to translate a closed socket
> (recvfrom() returning 0) to ECONNRESET or such.
I think we might just need to expand the existing branch for EOF:
				if (r < 0)
					ereport(COMMERROR,
							(errcode_for_socket_access(),
							 errmsg("could not accept SSL connection: %m")));
				else
					ereport(COMMERROR,
							(errcode(ERRCODE_PROTOCOL_VIOLATION),
							 errmsg("could not accept SSL connection: EOF detected")));
The openssl docs say:
The following return values can occur:
0
    The TLS/SSL handshake was not successful but was shut down controlled and by the specifications of the TLS/SSL protocol. Call SSL_get_error() with the return value ret to find out the reason.
1
    The TLS/SSL handshake was successfully completed, a TLS/SSL connection has been established.
<0
The TLS/SSL handshake was not successful because a fatal error occurred either at the protocol level or a connection failure occurred. The shutdown was not clean. It can also occur if action is needed to continue the operation for nonblocking BIOs. Call SSL_get_error() with the return value ret to find out the reason.
Which fits with my reproducer - due to the abort the connection was *not* shut
down via SSL in a controlled manner, therefore r < 0.
Hm, oddly enough, there's this tidbit in the SSL_get_error() manpage:
 On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL,
 nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the
 returned error is SSL_ERROR_SSL with a meaningful error on the error stack.
But I reproduced this with 3.1.
Seems like we should just treat errno == 0 as a reason to emit the "EOF
detected" message?
I wonder if we should treat send/recv returning 0 different from an error
message perspective during an established connection. Right now we produce
  could not receive data from client: Connection reset by peer
because be_tls_read() sets errno to ECONNRESET - despite that not having been
returned by the OS.  But I guess that's a topic for another day.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Verite | 2023-12-08 19:45:23 | Re: Emitting JSON to file using COPY TO | 
| Previous Message | Andres Freund | 2023-12-08 18:51:01 | Re: backtrace_on_internal_error |