Re: Logical replication timeout problem

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2021-09-20 16:13:24
Message-ID: CAA5-nLA8C0LZdVDdPDNeWcJYGzGVV3nJ7Ni0PdSyHJNxi628ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

By passing the autovacuum parameter to off the problem did not occur right
after loading the table as in our previous tests. However, the timeout
occurred later. We have seen the accumulation of .snap files for several Gb.

...
-rw-------. 1 postgres postgres 16791226 Sep 20 15:26
xid-1238444701-lsn-2D2B-F5000000.snap
-rw-------. 1 postgres postgres 16973268 Sep 20 15:26
xid-1238444701-lsn-2D2B-F6000000.snap
-rw-------. 1 postgres postgres 16790984 Sep 20 15:26
xid-1238444701-lsn-2D2B-F7000000.snap
-rw-------. 1 postgres postgres 16988112 Sep 20 15:26
xid-1238444701-lsn-2D2B-F8000000.snap
-rw-------. 1 postgres postgres 16864593 Sep 20 15:26
xid-1238444701-lsn-2D2B-F9000000.snap
-rw-------. 1 postgres postgres 16902167 Sep 20 15:26
xid-1238444701-lsn-2D2B-FA000000.snap
-rw-------. 1 postgres postgres 16914638 Sep 20 15:26
xid-1238444701-lsn-2D2B-FB000000.snap
-rw-------. 1 postgres postgres 16782471 Sep 20 15:26
xid-1238444701-lsn-2D2B-FC000000.snap
-rw-------. 1 postgres postgres 16963667 Sep 20 15:27
xid-1238444701-lsn-2D2B-FD000000.snap
...

2021-09-20 17:11:29 CEST [12687]: [1283-1] user=,db=,client= LOG:
checkpoint starting: time
2021-09-20 17:11:31 CEST [12687]: [1284-1] user=,db=,client= LOG:
checkpoint complete: wrote 13 buffers (0.0%); 0 WAL file(s) added, 0
removed, 0 recycled; write=1.713 s, sync=0.001 s, total=1.718 s
; sync files=12, longest=0.001 s, average=0.001 s; distance=29 kB,
estimate=352191 kB
2021-09-20 17:12:43 CEST [59986]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout
2021-09-20 17:12:43 CEST [12546]: [1068-1] user=,db=,client= LOG: worker
process: logical replication worker for subscription 24215702 (PID 59986)
exited with exit code 1
2021-09-20 17:12:43 CEST [39945]: [1-1] user=,db=,client= LOG: logical
replication apply worker for subscription "sub" has started
2021-09-20 17:12:43 CEST [39946]: [1-1] user=repuser,db=db,client=127.0.0.1
LOG: received replication command: IDENTIFY_SYSTEM

Regards,

Fabrice

On Mon, Sep 20, 2021 at 1:51 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema
> of the publisher, this generates the worker's timeout after one minute from
> the end of this load. The table on which this load is executed is not
> replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1]
> user=postgres,db=db012a00,client=[local] LOG: duration: 1281408.171 ms
> statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:
> automatic analyze of table "db.table " system usage: CPU: user: 4.13 s,
> system: 0.55 s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
> terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout
> I thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2021-09-20 16:16:39 logical replication restrictions
Previous Message Tom Lane 2021-09-20 15:49:53 Re: So, about that cast-to-typmod-minus-one business