Re: walreceiver termination

From: Justin King <kingpin867(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: walreceiver termination
Date: 2020-04-23 19:51:22
Message-ID: CAE39h22JyUze31RVcbNa-MN2eKW6POa1RX6V+=u9EGYJD=pM6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I assume it would be related to the following:

LOG: incorrect resource manager data checksum in record at 2D6/C259AB90

since the walreceiver terminates just after this - but I'm unclear
what precisely this means. Without digging into the code, I would
guess that it's unable to verify the checksum on the segment it just
received from the master; however, there are multiple replicas here,
so it points to an issue on this client. However, it happens
everywhere -- we have ~16 replicas across 3 different clusters (on
different versions) and we see this uniformly across them all at
seemingly random times. Also, just to clarify, this will only happen
on a single replica at a time.

On Thu, Apr 23, 2020 at 2:46 PM Justin King <kingpin867(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2020 at 12:47 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Justin King <kingpin867(at)gmail(dot)com> writes:
> > > We've seen unexpected termination of the WAL receiver process. This
> > > stops streaming replication, but the replica stays available --
> > > restarting the server resumes streaming replication where it left off.
> > > We've seen this across nearly every recent version of PG, (9.4, 9.5,
> > > 11.x, 12.x) -- anything omitted is one we haven't used.
> >
> > > I don't have an explanation for the cause, but I was able to set
> > > logging to "debug5" and run an strace of the walrecevier PID when it
> > > eventually happened. It appears as if the SIGTERM is coming from the
> > > "postgres: startup" process.
> >
> > The startup process intentionally SIGTERMs the walreceiver under
> > various circumstances, so I'm not sure that there's any surprise
> > here. Have you checked the postmaster log?
> >
> > regards, tom lane
>
> Yep, I included "debug5" output of the postmaster log in the initial post.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-04-23 20:06:53 Re: walreceiver termination
Previous Message Justin King 2020-04-23 19:46:11 Re: walreceiver termination