From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | masao(dot)fujii(at)gmail(dot)com |
Cc: | thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: FATAL: could not send end-of-streaming message to primary: no COPY in progress |
Date: | 2016-04-20 08:18:30 |
Message-ID: | 20160420.171830.207678798.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 20 Apr 2016 16:16:40 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHvzV2J0QodA8x1xCx3CbaBmJTveQeoLFzX8hq5G25jEA(at)mail(dot)gmail(dot)com>
> On Thu, Mar 31, 2016 at 9:15 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > Hi hackers,
> >
> > If you shut down a primary server, a standby that is streaming from it says54:
> >
> > LOG: replication terminated by primary server
> > DETAIL: End of WAL reached on timeline 1 at 0/14F4B68.
> > FATAL: could not send end-of-streaming message to primary: no COPY in progress
> >
> > Isn't that FATAL ereport a bug?
>
> ISTM that the cause is that walsender exits and replication connection is
> closed just after "COPY 0" is sent. That is, then after receiving "COPY 0",
> walreceiver tries to send an end-of-copy message to the primary, but fails
> because the connection has been already closed.
Though the message is followed by repetitions of other FATAL
messages, the message above itself seems a bit alarming.
> > How is clean server shutdown supposed to work?
>
> One option is to make walsender wait for end-of-copy message from walreceiver
> before it closes the connection and exits, after sending "COPY 0" message.
> But one question is; how should walsender behave when walreceiver gets stuck
> and cannot reply an end-of-copy message to walsender? Probably we need
> the timeout (maybe we can use wal_sender_timeout here but not sure yet
> if it's appropriate or not).
-1. It is totally useless other than to avoid the FATAL message.
> Another option is to prevent walreceiver from sending an end-of-copy message.
> If "COPY 0" always means the exit of walsender and the termination of
> the connection, there seems to be no need to send back an end-of-copy message.
> I've not checked yet how this interferes with other replication logics, though.
Looking into walsender.c, walsender thinks "COPY 0" is a signal
of its death coming just after, that is, proc_exit(0).
On the other hand the comment at the beginning of walreceiver.c
says that,
* If the primary server ends streaming, but doesn't disconnect, walreceiver
* goes into "waiting" mode, and waits for the startup process to give new
* instructions. The startup process will treat that the same as
* disconnection, and will rescan the archive/pg_xlog directory. But when the
* startup process wants to try streaming replication again, it will just
* nudge the existing walreceiver process that's waiting, instead of launching
* a new one.
If we assume this is an useful behavior and want to keep it, a
termination after an end of XLOG streaming is just the same with
that for psql.
| FATAL: terminating connection due to administrator command
| server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.
Or, we should provide another command to inform a termination.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Yury Zhuravlev | 2016-04-20 08:43:53 | Re: Proposal: Remove regress-python3-mangle.mk |
Previous Message | Michael Paquier | 2016-04-20 07:53:35 | Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby |