From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: backtrace_on_internal_error |
Date: | 2023-12-09 21:06:42 |
Message-ID: | 20231209210642.h3uxp66annsueufh@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-12-09 12:41:30 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> >> I was wondering about that too. But if we do so, why not also do it for
> >> writes?
>
> > Writes don't act that way, do they? EOF on a pipe gives you an error,
> > not silently reporting that zero bytes were written and leaving you
> > to retry indefinitely.
>
> On further reflection I realized that you're right so far as the SSL
> code path goes, because SSL_write() can involve physical reads as well
> as writes, so at least in principle it's possible that we'd see EOF
> reported this way from that function.
Heh. I'll just claim that's what I was thinking about.
> Also, the libpq side does need work of the same sort, leading to the
> v2-0001 patch attached.
I'd perhaps add a comment explaining why it's plausible that we'd see that
that in the write case.
> I also realized that we have more or less the same problem at the
> caller level, allowing a similar failure for non-SSL connections.
> So I'm also proposing 0002 attached. Your results from aggregated
> logs didn't show "could not receive data from client: Success" as a
> common case, but since we weren't bothering to zero errno beforehand,
> it's likely that such failures would show up with very random errnos.
I did only look at the top ~100 internal errors, after trying to filter out
extensions, i.e. the list wasn't exhaustive. There's also very few non-ssl
connections. But I just checked, and for that error message, I do see some
XX000, but only in older versions. There's ENETUNREACH, ECONNRESET,
ETIMEDOUT, EHOSTUNREACH which these days are all handled as non XX000 by
errcode_for_socket_access().
> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
> index bd72a87bbb..76b647ce1c 100644
> --- a/src/interfaces/libpq/fe-secure.c
> +++ b/src/interfaces/libpq/fe-secure.c
> @@ -211,6 +211,8 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
> int result_errno = 0;
> char sebuf[PG_STRERROR_R_BUFLEN];
>
> + SOCK_ERRNO_SET(0);
> +
> n = recv(conn->sock, ptr, len, 0);
>
> if (n < 0)
> @@ -232,6 +234,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
>
> case EPIPE:
> case ECONNRESET:
> + case 0: /* treat as EOF */
> libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
> "\tThis probably means the server terminated abnormally\n"
> "\tbefore or while processing the request.");
If we were treating it as EOF, we'd not "queue" an error message, no? Normally
recv() returns 0 in that case, so we'd just return, right?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-12-09 21:39:37 | Re: encoding affects ICU regex character classification |
Previous Message | Sutou Kouhei | 2023-12-09 21:01:36 | Re: Make COPY format extendable: Extract COPY TO format implementations |