walsender termination error messages worse in v10

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: walsender termination error messages worse in v10
Date: 2017-06-02 18:51:26
Message-ID: 20170602185126.xzlcoublmaduixcy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: 2017-03-23 08:36:36 -0400

Logical replication support for initial data copy

made walreceiver emit worse messages in v10 than before when the master
gets shut down. Before 10 you'll get something like:

2017-06-02 11:46:07.173 PDT [16143][] LOG: replication terminated by primary server
2017-06-02 11:46:07.173 PDT [16143][] DETAIL: End of WAL reached on timeline 1 at 0/1B7FB8C8.
2017-06-02 11:46:07.173 PDT [16143][] FATAL: could not send end-of-streaming message to primary: no COPY in progress

the last bit is a bit superflous, but it still kinda makes sense. Now
you get:
2017-06-02 11:47:09.362 PDT [17061][] FATAL: unexpected result after CommandComplete: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

the reason is that

static int
libpqrcv_receive(char **buffer, pgsocket *wait_fd)
{

previously did:
if (rawlen == -1) /* end-of-streaming or error */
{
PGresult *res;

res = PQgetResult(streamConn);
if (PQresultStatus(res) == PGRES_COMMAND_OK ||
PQresultStatus(res) == PGRES_COPY_IN)
{
PQclear(res);
return -1;
}
else
{
PQclear(res);
ereport(ERROR,
(errmsg("could not receive data from WAL stream: %s",
PQerrorMessage(streamConn))));
}
}

and now does

if (rawlen == -1) /* end-of-streaming or error */
{
PGresult *res;

res = PQgetResult(conn->streamConn);
if (PQresultStatus(res) == PGRES_COMMAND_OK)
{
PQclear(res);

/* Verify that there are no more results */
res = PQgetResult(conn->streamConn);
if (res != NULL)
ereport(ERROR,
(errmsg("unexpected result after CommandComplete: %s",
PQerrorMessage(conn->streamConn))));
return -1;
}
else if (PQresultStatus(res) == PGRES_COPY_IN)
{
PQclear(res);
return -1;
}
else
{
PQclear(res);
ereport(ERROR,
(errmsg("could not receive data from WAL stream: %s",
pchomp(PQerrorMessage(conn->streamConn)))));
}
}

note the new /* Verify that there are no more results */ bit.

I don't understand why the new block is there, nor does the commit
message explain it.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-06-02 18:52:36 Re: Get stuck when dropping a subscription during synchronizing table
Previous Message Tom Lane 2017-06-02 18:33:09 Re: Perfomance bug in v10