Re: Long running query causing XID limit breach

From: sud <suds1434(at)gmail(dot)com>
To: Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Long running query causing XID limit breach
Date: 2024-05-26 09:15:50
Message-ID: CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
wrote:

> On Sat, May 25, 2024 at 11:00 PM sud <suds1434(at)gmail(dot)com> wrote:
>
>>
>> But i have one question here , does max_standby_streaming_delay = 14 ,
>> means the queries on the standby will get cancelled after 14 seconds?
>>
>
> No, your query gets cancelled when it stalls replication for >14 sec. If
> your master is idle and does not send any WAL and the replica has
> caught up, the query can take as long as it wants.
>

Thank you so much.
For example , in below scenario,
if i have insert query going on on primary instance on table 25th may
partition of TABLE1, and at same time we are selecting data from 24th May
partition , then with "max_standby_streaming_delay = 14" setup , it just
allows the select query to run for any duration without any restriction
even if the WAL gets applied on the standby regularly. Also INSERT query in
primary won't make the standby SELECT queries to cancel as because the WAL
record of INSERT queries on the primary instance is not conflicting to the
exact rows those were being read by the standby. Is my understanding
correct here?

However, if i have Update/Delete query going on on primary instance on
table 25th may partition of TABLE1 and on the exact same set of rows which
were being read by the standby instance by the SELECT query, then the
application of such WAL record to standby can max wait for 14 seconds and
thus those select query are prone to be cancelled after 14 seconds. Is this
understanding correct?

If the above is true then it doesn't look good, as because in an OLTP
system there will be a lot of DMLS happening on the writer instances and
there may be many queries running on the reader/standby instances which are
meant to run for hours. And if we say making those SELECT queries run for
hours means compromising an hour of "high availability"/RPO or a lag of an
hour between primary and standby , that doesn't look good. Please
correct me if I am missing something here.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Victor Dobrovolsky 2024-05-26 13:20:10 scalar plpgsql functions and their stability flags
Previous Message Torsten Förtsch 2024-05-26 08:13:28 Re: Long running query causing XID limit breach