Re: walreceiver termination

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: kingpin867(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: walreceiver termination
Date: 2020-05-08 00:56:03
Message-ID: 20200508.095603.1336764899960166654.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello.

At Mon, 4 May 2020 09:09:15 -0500, Justin King <kingpin867(at)gmail(dot)com> wrote in
> Would there be anyone that might be able to help troubleshoot this
> issue -- or at least give me something that would be helpful to look
> for?
>
> https://www.postgresql.org/message-id/flat/CAGH8ccdWLLGC7qag5pDUFbh96LbyzN_toORh2eY32-2P1%3Dtifg%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/CANQ55Tsoa6%3Dvk2YkeVUN7qO-2YdqJf_AMVQxqsVTYJm0qqQQuw%40mail.gmail.com
> https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6
>
> I'm not the first one to report something similar and all the
> complaints have a different filesystem in common -- particularly ZFS
> (or btrfs, in the bottom case). Is there anything more we can do here
> to help narrow down this issue? I'm happy to help, but I honestly
> wouldn't even know where to begin.

The sendto() call at the end of your strace output is "close
connecion" request to wal sender and normally should be followed by
close() and kill(). If it is really the last strace output, the
sendto() is being blocked with buffer-full.

My diagnosis of the situation is that your replication connection had
a trouble and the TCP session is broken in the way wal receiver
couldn't be aware of the breakage. As the result feedback message
packets from wal receiver were detained in tcp send buffer then
finally the last sendto() was blocked while sending the
close-connection message.

If it happens constantly, routers or firewalls between the primary and
standby may be discarding sessions inadvertantly.

I'm not sure how ZFS can be involved in this trouble, though.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-general by date

  From Date Subject
Next Message github kran 2020-05-08 01:51:31 Re: AutoVacuum and growing transaction XID's
Previous Message David Rowley 2020-05-07 23:46:00 Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )