From: | Justin King <kingpin867(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: walreceiver termination |
Date: | 2020-04-23 19:51:22 |
Message-ID: | CAE39h22JyUze31RVcbNa-MN2eKW6POa1RX6V+=u9EGYJD=pM6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I assume it would be related to the following:
LOG: incorrect resource manager data checksum in record at 2D6/C259AB90
since the walreceiver terminates just after this - but I'm unclear
what precisely this means. Without digging into the code, I would
guess that it's unable to verify the checksum on the segment it just
received from the master; however, there are multiple replicas here,
so it points to an issue on this client. However, it happens
everywhere -- we have ~16 replicas across 3 different clusters (on
different versions) and we see this uniformly across them all at
seemingly random times. Also, just to clarify, this will only happen
on a single replica at a time.
On Thu, Apr 23, 2020 at 2:46 PM Justin King <kingpin867(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2020 at 12:47 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Justin King <kingpin867(at)gmail(dot)com> writes:
> > > We've seen unexpected termination of the WAL receiver process. This
> > > stops streaming replication, but the replica stays available --
> > > restarting the server resumes streaming replication where it left off.
> > > We've seen this across nearly every recent version of PG, (9.4, 9.5,
> > > 11.x, 12.x) -- anything omitted is one we haven't used.
> >
> > > I don't have an explanation for the cause, but I was able to set
> > > logging to "debug5" and run an strace of the walrecevier PID when it
> > > eventually happened. It appears as if the SIGTERM is coming from the
> > > "postgres: startup" process.
> >
> > The startup process intentionally SIGTERMs the walreceiver under
> > various circumstances, so I'm not sure that there's any surprise
> > here. Have you checked the postmaster log?
> >
> > regards, tom lane
>
> Yep, I included "debug5" output of the postmaster log in the initial post.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-04-23 20:06:53 | Re: walreceiver termination |
Previous Message | Justin King | 2020-04-23 19:46:11 | Re: walreceiver termination |