Timeout in Logical Replication

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Timeout in Logical Replication
Date: 2023-07-29 13:49:57
Message-ID: 20230729135001.6E953220100@thebes.rime.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Logical replication from PG 15.3 to 15.2 has been running without problems until a few days ago.

Now the subscriber shows repeated messages like:
```
2023-07-29 08:25:04.523 UTC [26] LOG: checkpoint complete: wrote 8692 buffers (53.1%); 0 WAL file(s) added, 1 removed, 14 recycled; write=269.921 s, sync=0.485 s, total=270.438 s; sync files=37, longest=0.224 s, average=0.014 s; distance=230568 kB, estimate=436766 kB

2023-07-29 08:25:34.550 UTC [26] LOG: checkpoint starting: time

2023-07-29 08:27:55.699 UTC [142] ERROR: could not receive data from WAL stream: server closed the connection unexpectedly

This probably means the server terminated abnormally
before or while processing the request.

2023-07-29 08:27:55.702 UTC [159] LOG: logical replication apply worker for subscription "<SUB_NAME>" has started

2023-07-29 08:27:55.706 UTC [1] LOG: background worker "logical replication worker" (PID 142) exited with exit code 1
```

And the publisher shows repeated messages like:
```
2023-07-29 08:24:50.341 UTC [530982] STATEMENT: START_REPLICATION SLOT "<SUB NAME>" LOGICAL 37D1/1E0DD9A0 (proto_version '3', publication_names '"<PUB NAME>"')
2023-07-29 08:27:36.956 UTC [530982] LOG: terminating walsender process due to replication timeout
2023-07-29 08:27:36.956 UTC [530982] CONTEXT: slot "<SUB NAME>", output plugin "pgoutput", in the change callback, associated LSN 37D0/F9E8C2E8
```

I can connect using `psql` from either node back to the other. As far as I have been able to determine no routing or firewall changes have been made.

Reading other similar reports suggests that deleting and recreating the sub will fix the problem, but I'd like to understand/avoid it.

Any suggestions on how to track this down would be appreciated.

-

Browse pgsql-general by date

  From Date Subject
Next Message Erik Wienhold 2023-07-29 16:57:44 Re: How to get an md5/sha256 hash of a really large object in psql?
Previous Message Alban Hertroys 2023-07-29 09:37:55 Re: How to improve the performance of my SQL query?