From: | Kouber Saparev <kouber(at)gmail(dot)com> |
---|---|
To: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Master database delaying replication |
Date: | 2023-08-03 21:15:48 |
Message-ID: | CAN4RuQsnX-3+Qj+2VXmJZDY08SWNDygXfTd66pUL5sWiQFcMCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hello,
We are using PostgreSQL 15.3 with 5 servers configured for streaming
replication:
DB1 = master database
DB2 = replica of DB1
Warehouse 1 = replica of DB2 (cascaded)
Warehouse 2 = replica of DB2 (cascaded)
Remote Warehouse = replica of DB2 (cascaded)
Every now and then we experience replication lag in 2 cases:
1. Heavy writing on DB1, in which case Remote Warehouse is lagging the
most, since the network bandwidth to it is limited;
2. Slow query executing on one of the replicas, in which case the WAL
replaying is paused until the query finishes.
Yet, in one day we had twice a strange situation when all of a sudden all
the replicas started lagging at the same time. It lasted for about 30
minutes when it auto-fixed itself. The simultaneity excluded case #2 cited
above, and the fact that the last replica to recover was actually not the
usual Remote Warehouse, but Warehouse 2, excluded also case #1, as if there
was so heavy writing, then the last to recover would be the furthest
topologically.
In the screenshot above the green line is the replica delay of Warehouse 2
(the last to recover with the maximal lag), while the red one - Remote
Warehouse (the first to recover with minimal lag).
Any ideas why something like that would happen? What should I be looking
for in the logs? What is the theoretical reason for this?
Kind regards,
--
Kouber Saparev
From | Date | Subject | |
---|---|---|---|
Next Message | Kouber Saparev | 2023-08-04 03:11:37 | Re: Master database delaying replication |
Previous Message | kyle Hailey | 2023-08-03 20:43:36 | Re: Conversion to partitioning table |