From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-committers(at)postgresql(dot)org |
Subject: | Re: pgsql: Make new crash restart test a bit more robust. |
Date: | 2017-09-19 23:16:50 |
Message-ID: | 20170919231650.mjb6awkmkrfop2rs@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On 2017-09-19 13:53:18 -0700, Andres Freund wrote:
> On 2017-09-19 16:46:58 -0400, Tom Lane wrote:
> > Have we forgotten an fflush() or something?
> >
> > Also, maybe problem is on client side. I vaguely recall a libpq bug
> > wherein it would complain about socket EOF even though data remained
> > to be processed. Maybe we reintroduced something like that?
>
> That seems quite possible.
Preface: This is largely a as of yet unverified theory. But seems
worthwhile to investigate whether it's the cause of this issue or not.
By changing the error message I know that the "server closed the
connection unexpectedly" ERROR is coming from pqsecure_raw_read(). The
caller here has to be pqReadData(). Where have the following block:
if (nread > 0)
{
conn->inEnd += nread;
/*
* Hack to deal with the fact that some kernels will only give us back
* 1 packet per recv() call, even if we asked for more and there is
* more available. If it looks like we are reading a long message,
* loop back to recv() again immediately, until we run out of data or
* buffer space. Without this, the block-and-restart behavior of
* libpq's higher levels leads to O(N^2) performance on long messages.
*
* Since we left-justified the data above, conn->inEnd gives the
* amount of data already read in the current message. We consider
* the message "long" once we have acquired 32k ...B
*/
if (conn->inEnd > 32768 &&
(conn->inBufSize - conn->inEnd) >= 8192)
{
someread = 1;
goto retry3;
}
return 1;
}
So imagine we've just read one block containing the error message from
the server. That's going to be less than 32kb. So we go into the retry3
path.
/* OK, try to read some data */
retry3:
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
conn->inBufSize - conn->inEnd);
if (nread < 0)
{
if (SOCK_ERRNO == EINTR)
goto retry3;
/* Some systems return EAGAIN/EWOULDBLOCK for no data */
#ifdef EAGAIN
if (SOCK_ERRNO == EAGAIN)
return someread;
#endif
#if defined(EWOULDBLOCK) && (!defined(EAGAIN) || (EWOULDBLOCK != EAGAIN))
if (SOCK_ERRNO == EWOULDBLOCK)
return someread;
#endif
/* We might get ECONNRESET here if using TCP and backend died */
#ifdef ECONNRESET
if (SOCK_ERRNO == ECONNRESET)
goto definitelyFailed;
#endif
/* pqsecure_read set the error message for us */
return -1;
}
So if the connection actually was closed we:
definitelyFailed:
/* Do *not* drop any already-read data; caller still wants it */
pqDropConnection(conn, false);
conn->status = CONNECTION_BAD; /* No more connection to backend */
return -1;
and PQgetResult() will happily signal that upwards with an error:
/* Wait for some more data, and load it. */
if (flushResult ||
pqWait(TRUE, FALSE, conn) ||
pqReadData(conn) < 0)
{
/*
* conn->errorMessage has been set by pqWait or pqReadData. We
* want to append it to any already-received error message.
*/
pqSaveErrorResult(conn);
conn->asyncStatus = PGASYNC_IDLE;
return pqPrepareAsyncResult(conn);
ISTM, we need to react differently if we've already partially read data
successfully? Am I missing something?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-09-19 23:18:18 | Re: pgsql: Avoid use of non-portable strnlen() in pgstat_clip_activity(). |
Previous Message | Peter Eisentraut | 2017-09-19 22:56:14 | pgsql: Add basic TAP test setup for pg_upgrade |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-09-19 23:29:33 | Re: SCRAM in the PG 10 release notes |
Previous Message | Melanie Plageman | 2017-09-19 23:15:46 | Re: [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable. |