Quick Links

Logical replication stuck in catchup state

From:	Dan shmidt <dshmidt(at)hotmail(dot)com>
To:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Logical replication stuck in catchup state
Date:	2020-06-09 21:30:38
Message-ID:	MN2PR02MB64478461B0651F7774137B87A4820@MN2PR02MB6447.namprd02.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi All,

We have a setup in which there are several master nodes replicating to a single slave/backup node. We are using Postgres 11.4.
Recently, one of the nodes seems to be stuck and stopped replicating.
I did some basic troubleshooting and couldn't find the root cause for that.

On one hand:
- the replication slot does seem to be active according to pg_replication_slots (Sorry no screenshot)
- on slave node it seems that last_msg_receipt_time is updating on pg_stat_subscription

On the other hand:
- on the slave node: received_lsn keeps pointing on the same wal segment (pg_stat_subscription)
- redo_lsn - restart_lsn shows ~20GB lag

According to logs on the master it seems that the sender hits a timeout, when trying to increase the wal_sender_timeout even to 0 (no timeout) - it doesn't have any effect. On the other hand, the last_msg_receipt_time is updated. How is that possible?

Screenshots attached. The stuck subscription/replication slot is the one ending with "53db6". On images with more than one row - it's the second one.

Any suggestions on what may be the root cause or how to continue debugging?
Appreciate your help.

Thank you,
Dan.

Attachment	Content-Type	Size
pg_stat_replication.png	image/png	83.6 KB
	image/png	51.9 KB
pg_stat_subscription.png	image/png	136.7 KB
pg_subscription.png	image/png	173.9 KB

Responses

Re: Logical replication stuck in catchup state at 2020-06-09 21:45:28 from Michael Lewis
Re: Logical replication stuck in catchup state at 2020-06-09 22:21:00 from Peter Eisentraut
Re: Logical replication stuck in catchup state at 2020-06-10 06:15:48 from Dan shmidt

Browse pgsql-general by date

	From	Date	Subject
Next Message	Michael Lewis	2020-06-09 21:45:28	Re: Logical replication stuck in catchup state
Previous Message	Michael Lewis	2020-06-09 21:05:05	Re: Planner misestimation for JOIN with VARCHAR