Re: conflict with recovery when delay is gone

From: Mohamed Wael Khobalatte <mkhobalatte(at)grubhub(dot)com>
To: Radoslav Nedyalkov <rnedyalkov(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: conflict with recovery when delay is gone
Date: 2020-11-14 22:48:38
Message-ID: CABZeWdy5GB9F6r+QJ8QSsX_5+_8kzb9Kbfif5f-aP1k4Vf=ExA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Nov 14, 2020 at 2:46 PM Radoslav Nedyalkov <rnedyalkov(at)gmail(dot)com>
wrote:

>
>
> On Fri, Nov 13, 2020 at 8:13 PM Radoslav Nedyalkov <rnedyalkov(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Fri, Nov 13, 2020 at 7:37 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
>> wrote:
>>
>>> On Fri, 2020-11-13 at 15:24 +0200, Radoslav Nedyalkov wrote:
>>> > On a very busy master-standby setup which runs typical olap processing
>>> -
>>> > long living , massive writes statements, we're getting on the standby:
>>> >
>>> > ERROR: canceling statement due to conflict with recovery
>>> > FATAL: terminating connection due to conflict with recovery
>>> >
>>> > The weird thing is that cancellations happen usually after standby has
>>> experienced
>>> > some huge delay(2h), still not at the allowed maximum(3h). Even
>>> recently run statements
>>> > got cancelled when the delay is already at zero.
>>> >
>>> > Sometimes the situation got relaxed after an hour or so.
>>> > Restarting the server instantly helps.
>>> >
>>> > It is pg11.8, centos7, hugepages, shared_buffers 196G from 748G.
>>> >
>>> > What phenomenon could we be facing?
>>>
>>> Hard to say. Perhaps an unusual kind of replication conflict?
>>>
>>> What is in "pg_stat_database_conflicts" on the standby server?
>>>
>>
>> db01=# select * from pg_stat_database_conflicts;
>> datid | datname | confl_tablespace | confl_lock | confl_snapshot |
>> confl_bufferpin | confl_deadlock
>>
>> -------+-----------+------------------+------------+----------------+-----------------+----------------
>> 13877 | template0 | 0 | 0 | 0 |
>> 0 | 0
>> 16400 | template1 | 0 | 0 | 0 |
>> 0 | 0
>> 16402 | postgres | 0 | 0 | 0 |
>> 0 | 0
>> 16401 | db01 | 0 | 0 | 51 |
>> 0 | 0
>> (4 rows)
>>
>> On a freshly restarted standby we've just got similar behaviour after a 2
>> hours delay and a slow catch-up.
>> confl_snapshots is 51 and we have exactly the same number cancelled
>> statements.
>>
>>
> No luck so far. Searching for the explanation i found we fail into the
> unexplained case when
> snapshot conflicts happen even hot_standby_feedback is on.
>
> Thanks,
> Rado
>
>

Perhaps you have a value set for old_snapshot_threshold? If not, do the
walreceiver connections drop out?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ecenur Corlu 2020-11-14 23:45:51 I have just downloaded Postgre SQL and "pgadmin 4" doesn't open.
Previous Message Radoslav Nedyalkov 2020-11-14 19:45:35 Re: conflict with recovery when delay is gone