Re: Master database delaying replication

From: Kouber Saparev <kouber(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Master database delaying replication
Date: 2023-08-04 03:11:37
Message-ID: CAN4RuQs0jWesL0urRahicrRABHYZ4cDbYr7BgBQqDd3axWZs_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

In addition, we monitor the amount of WALs generated per minute. We did not
observe any peaks within this time frame. The rate stays within the normal
30-40 (per minute) range. In contrast, when we repack some huge tables, the
write rate goes up to 400+ WALs per minute and then Remote Warehouse lags
behind.
So for some reason, the master database (DB1) is stalling all the WAL
senders (1 in fact, as only DB2 is directly connected to it) while it
executes something. The question is what is this thing? What might cause
such behaviour?

На пт, 4.08.2023 г. в 0:15 ч. Kouber Saparev <kouber(at)gmail(dot)com> написа:

> Hello,
>
> We are using PostgreSQL 15.3 with 5 servers configured for streaming
> replication:
>
> DB1 = master database
> DB2 = replica of DB1
> Warehouse 1 = replica of DB2 (cascaded)
> Warehouse 2 = replica of DB2 (cascaded)
> Remote Warehouse = replica of DB2 (cascaded)
>
> Every now and then we experience replication lag in 2 cases:
> 1. Heavy writing on DB1, in which case Remote Warehouse is lagging the
> most, since the network bandwidth to it is limited;
> 2. Slow query executing on one of the replicas, in which case the WAL
> replaying is paused until the query finishes.
>
> Yet, in one day we had twice a strange situation when all of a sudden all
> the replicas started lagging at the same time. It lasted for about 30
> minutes when it auto-fixed itself. The simultaneity excluded case #2 cited
> above, and the fact that the last replica to recover was actually not the
> usual Remote Warehouse, but Warehouse 2, excluded also case #1, as if there
> was so heavy writing, then the last to recover would be the furthest
> topologically.
>
> https://ibb.co/hWmhjkL
>
> In the screenshot above the green line is the replica delay of Warehouse 2
> (the last to recover with the maximal lag), while the red one - Remote
> Warehouse (the first to recover with minimal lag).
>
> Any ideas why something like that would happen? What should I be looking
> for in the logs? What is the theoretical reason for this?
>
> Kind regards,
> --
> Kouber Saparev
>
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rajesh Kumar 2023-08-04 09:03:44 Archive cleanup
Previous Message Kouber Saparev 2023-08-03 21:15:48 Master database delaying replication