replication lag despite corrective config

From: Wyatt Alt <wyatt(dot)alt(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: replication lag despite corrective config
Date: 2018-11-20 01:46:55
Message-ID: CAGem3qCCP5c7JZUMApteK7TXT2yMzVi_Yy2nsZ5Pn=R30LrDvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've been struggling to eliminate replication lag on a Postgres 9.6.6
instance on Amazon RDS. I believe the lag is caused by early cleanup
conflicts from vacuums on the master, because I can reliably resolve it by
killing long-running queries on the standby. I most recently saw ten hours
of lag on Saturday and addressed it this way.

The standby is running with
hot_standby_feedback = on
max_standby_streaming_delay = 5min
max_standby_archive_delay = 30s

I am not using replication slots on the primary due to reported negative
interactions with pg_repack on large tables.

My rationale for the first two settings is that hot_standby_feedback should
address my issues almost all the time, but that max_standby_streaming_delay
would sometimes be necessary as a fallback, for instance in cases of a
transient connection loss between the standby and primary. I believe these
settings are mostly working, because lag is less frequent than it was when
I configured them.

My questions are,
* Am I overlooking anything in my configuration?
* What would explain lag caused by query conflicts given the
max_standby_streaming_delay setting? Shouldn't those queries be getting
killed?
* Is there any particular diagnostic info I should be collecting on the
next occurrence, to help me figure out the cause? Note that as I'm on RDS,
I don't have direct access to the datadir -- just psql.

Thanks for any advice!
Wyatt

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wyatt Alt 2018-11-20 03:45:56 Re: replication lag despite corrective config
Previous Message Merlin Moncure 2018-11-19 23:06:58 Re: plpgsql and intarray extension; int[] - int[] operator does not exist ?