From: | Wyatt Alt <wyatt(dot)alt(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: replication lag despite corrective config |
Date: | 2018-11-20 03:45:56 |
Message-ID: | CAGem3qDhWBr3RU9vK=GRr2PN9fa84boCqsBrK243=6doZXunJw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sorry, I see now there was a similar question a few days ago:
https://www.postgresql.org/message-id/CAJw4d1WtzOdYzd8Nq2=uFK+Z0JY0L_pfg9TvCWPrmt3NCZq9GA@mail.gmail.com
Two ideas proposed (aside from disconnects):
* Autovacuum is truncating a page on the master and taking an
AccessExclusiveLock on the table in use on the replica
* A "pin conflict", which I'm still unfamiliar with.
The user's response says they are in the first bucket, but the argument
relies on max_standby_streaming_delay set to -1, while mine is 5 minutes. I
need to understand pin conflicts better, but the likely scenario Andrew
outlined doesn't apply to me. My offending queries were doing bitmap heap
scans on a 300GB table.
Reading the thread I see Andres ask for the "precise conflict" the user
gets -- is there a way I can get that without a datadir? And to re-frame
the original question, are there causes of replication lag that
max_standby_streaming_delay would not be expected to prevent, that would be
resolved by killing long standby queries? If so, what's the best way to
confirm?
Wyatt
On Mon, Nov 19, 2018 at 5:46 PM Wyatt Alt <wyatt(dot)alt(at)gmail(dot)com> wrote:
> I've been struggling to eliminate replication lag on a Postgres 9.6.6
> instance on Amazon RDS. I believe the lag is caused by early cleanup
> conflicts from vacuums on the master, because I can reliably resolve it by
> killing long-running queries on the standby. I most recently saw ten hours
> of lag on Saturday and addressed it this way.
>
> The standby is running with
> hot_standby_feedback = on
> max_standby_streaming_delay = 5min
> max_standby_archive_delay = 30s
>
> I am not using replication slots on the primary due to reported negative
> interactions with pg_repack on large tables.
>
> My rationale for the first two settings is that hot_standby_feedback
> should address my issues almost all the time, but that
> max_standby_streaming_delay would sometimes be necessary as a fallback, for
> instance in cases of a transient connection loss between the standby and
> primary. I believe these settings are mostly working, because lag is less
> frequent than it was when I configured them.
>
> My questions are,
> * Am I overlooking anything in my configuration?
> * What would explain lag caused by query conflicts given the
> max_standby_streaming_delay setting? Shouldn't those queries be getting
> killed?
> * Is there any particular diagnostic info I should be collecting on the
> next occurrence, to help me figure out the cause? Note that as I'm on RDS,
> I don't have direct access to the datadir -- just psql.
>
> Thanks for any advice!
> Wyatt
>
From | Date | Subject | |
---|---|---|---|
Next Message | Rene Romero Benavides | 2018-11-20 04:46:01 | Re: replication lag despite corrective config |
Previous Message | Wyatt Alt | 2018-11-20 01:46:55 | replication lag despite corrective config |