Re: terminating walsender process due to replication timeout

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: terminating walsender process due to replication timeout
Date: 2019-05-24 06:23:50
Message-ID: 8ef8fcb4-e893-58ab-8d48-4a8d802ab5f2@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 23/5/19 5:05 μ.μ., AYahorau(at)ibagroup(dot)eu wrote:
> Hello Everyone!
>
> I can simplify and describe the issue I faced.
> I have 2 nodes in db cluster: master and standby.
> I create a simple table on master node by a command via psql:
> CREATE TABLE table1 (a INTEGER);
> After this I fill the table by COPY command from a file which  contains 2000000 (2 million) entries.
>
> And in case when I run for example such a command:
> UPDATE table1 SET a='1'
> or such a command:
> DELETE FROM table1;
> I see in PostgreSQL log the an entry: terminating walsender process due to replication timeout.
>
> I suppose that this issue caused by small value of wal_sender_timeout=1s and long runtime of the queries (it takes about 15 seconds).
>
> What is the best way to proceed it? How to avoid this? Is there any additional configuration which can help here?
I have set mine to 15min. No problems for over 7 months, knock on wood.
>
>
> Regards,
> Andrei
>
>
>
> From: Andrei Yahorau/IBA
> To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>,
> Cc: pgsql-general(at)postgresql(dot)org, rene(dot)romero(dot)b(at)gmail(dot)com
> Date: 17/05/2019 11:04
> Subject: Re: terminating walsender process due to replication timeout
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> Hello.
>
> Thanks for the answer.
>
> Can frequent database operations cause getting a standby server behind? Is there a way to avoid this situation?
> I checked that walsender works well in my test  if I set wal_sender_timeout at least to 5 second.
>
> Best regards,
> Andrei Yahorau
>
>
>
>
> From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> To: AYahorau(at)ibagroup(dot)eu,
> Cc: rene(dot)romero(dot)b(at)gmail(dot)com, pgsql-general(at)postgresql(dot)org
> Date: 16/05/2019 10:36
> Subject: Re: terminating walsender process due to replication timeout
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Hello.
>
> At Wed, 15 May 2019 10:04:12 +0300, AYahorau(at)ibagroup(dot)eu wrote in <OF99D0D839(dot)6A5BCB70-ON432583FB(dot)0025912E-432583FB(dot)0026D664(at)iba(dot)by>
> > Hello,
> > Thank You for the response.
> >
> > Yes that's possible to monitor replication delay. But my questions were
> > not about monitoring network issues.
> >
> > I use exactly wal_sender_timeout=1s because it allows to detect
> > replication problems quickly.
>
> Though I don't have an exact idea of your configuration, it seems
> to me that your standby is simply getting behind more than one
> second from the master. If you regard the fact as a problem of
> replication, the configuration can be said to be finding the
> problem correctly.
>
> Since the keep-alive packet is sent in-band, it doesn't get to
> the standby before already-sent-but-not-processed packets.
>
> > So, I need clarification to the following  questions:
> > Is  it possible to use exactly this configuration and be sure that it will
> > be work properly.
> > What did I do wrong? Should I correct my configuration somehow?
> > Is this the same issue  as mentioned here:
> > https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
> > ? If it is so, why I do I face this problem again?
>
> It is not the same "problem". What was mentioned there is fast
> network making the sender-side loop busy, which prevents
> keepalive packet from sending.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-05-24 06:34:04 Re: terminating walsender process due to replication timeout
Previous Message Pavel Stehule 2019-05-24 03:36:29 Re: Strange performance degregation in sql function (PG11.1)