Re: Postgresql 11: terminating walsender process due to replication timeout

From: Abhishek Bhola <abhishek(dot)bhola(at)japannext(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql 11: terminating walsender process due to replication timeout
Date: 2021-09-09 07:06:25
Message-ID: CAEDsCzgNM=uni8fcf4f9PucjGxCdcRQi5uj5UEr_zTBzK5Bwag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

sourcedb:~$ postgres --version
postgres (PostgreSQL) 11.6

Sorry for missing this information.
But looks like this fix is already included in the version I am running.

Regards,
Abhishek Bhola

On Thu, Sep 9, 2021 at 3:56 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola <
> abhishek(dot)bhola(at)japannext(dot)co(dot)jp> wrote in
> > I have found some questions about the same error, but didn't find any of
> > them answering my problem.
> >
> > The setup is that I have two Postgres11 clusters (A and B) and they are
> > making use of publication and subscription features to copy data from A
> to
> > B.
> >
> > A (source DB- publication) --------------> B (target DB - subscription)
> >
> > This works fine, but often (not always) when the data volume being
> inserted
> > on a table in node A increases, it gives the following error.
> >
> > "terminating walsender process due to replication timeout"
> >
> > The data volume at the moment being entered is about 30K rows per second
> > continuously for hours through COPY command.
> >
> > Earlier the wal_sender_timeout was set to 5 sec and I would see this
> error
> > much often. I then increased it to 1 min and the frequency of this error
> > reduced. But I don't want to keep increasing it without understanding
> what
> > is causing it. I looked at the code of walsender.c and know the exact
> lines
> > where it's coming from.
> >
> > But I am still not clear which parameter is making the sender assume that
> > the receiver node is inactive and therefore it should stop the
> wal_sender.
> >
> > Can anyone please suggest what changes I should make to remove this
> error?
>
> What minor-version is the Postgres server mentioned? PostgreSQL 11
> have gotten the following fix at 11.6, which could be related to the
> trouble.
>
> https://www.postgresql.org/docs/11/release-11-6.html
>
> > Fix timeout handling in logical replication walreceiver processes
> > (Julien Rouhaud)
> >
> > Erroneous logic prevented wal_receiver_timeout from working in
> > logical replication deployments.
>
> The details of the fix is here.
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098
> > Fix timeout handling in logical replication worker
> >
> > The timestamp tracking the last moment a message is received in a
> > logical replication worker was initialized in each loop checking if a
> > message was received or not, causing wal_receiver_timeout to be ignored
> > in basically any logical replication deployments. This also broke the
> > ping sent to the server when reaching half of wal_receiver_timeout.
>
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

--
_This correspondence (including any attachments) is for the intended
recipient(s) only. It may contain confidential or privileged information or
both. No confidentiality or privilege is waived or lost by any
mis-transmission. If you receive this correspondence by mistake, please
contact the sender immediately, delete this correspondence (and all
attachments) and destroy any hard copies. You must not use, disclose, copy,
distribute or rely on any part of this correspondence (including any
attachments) if you are not the intended
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Uckun 2021-09-09 07:52:51 ALTER DATABASE SET not working as expected?
Previous Message Kyotaro Horiguchi 2021-09-09 06:56:35 Re: Postgresql 11: terminating walsender process due to replication timeout