Re: Long running query causing XID limit breach

From: sud <suds1434(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Long running query causing XID limit breach
Date: 2024-05-23 08:11:31
Message-ID: CAD=mzVVvK8xk-9m8h3Xu27cGN7BW329HKYdO+0EMXfWvSD3AGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, May 23, 2024 at 1:22 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:

> On Thu, 2024-05-23 at 02:46 +0530, sud wrote:
> > It's RDS postgres version 15.4. We suddenly saw the
> "MaximumUsedTransactionIDs"
> > reach to ~1.5billion and got alerted by team members who mentioned the
> database
> > is going to be in shutdown/hung if this value reaches to ~2billion and
> won't be
> > able to serve any incoming transactions. It was a panic situation.
> >
> > I have heard of it before , because of the way postgres works and the
> XID being
> > a datatype of length 32 bit integer can only represent (2^32)/2=~2
> billion
> > transactions. However, as RDS performs the auto vacuum , we thought that
> we need
> > not worry about this issue. But it seems we were wrong. And we found one
> adhoc
> > "SELECT '' query was running on the reader instance since the last
> couple of
> > days and when that was killed, the max xid (MaximumUsedTransactionIDs)
> dropped
> > to 50million immediately.
>
> This has nothing to do with autovacuum running.
> PostgreSQL won't freeze any rows above the xmin horizon (see the
> "backend_xmin"
> column in "pg_stat_activity").
>
> > So I have few questions,
> >
> > 1)This system is going to be a 24/7 up and running system which will
> process
> > ~500million business transactions/day in future i.e. ~4-5billion
> rows/day
> > inserted across multiple tables each day. And as I understand each row
> will
> > have XID allocated. So in that case , does it mean that, we will need
> > (5billion/24)=~200million XID/hour and thus , if any such legitimate
> > application "SELECT" query keeps running for ~10 hours (and thus keep
> the
> > historical XID alive) , then it can saturate the
> "MaximumUsedTransactionIDs"
> > and make the database standstill in 2billion/200million=~10hrs. Is this
> > understanding correct? Seems we are prone to hit this limit sooner
> going forward.
>
> Yes, that is correct. You cannot run such long-running queries with a
> transaction rate like that.
>
>
When you mean transaction ,does it mean one commit ? For example if it's
inserting+committing ~1000 rows in one batch then all the 1000 rows will be
marked as one XID rather than 1000 different XID. and so we should look for
batch processing rather than row by row types processing. Is the
understanding correct?

> One thing you could consider is running the long-running queries on a
> standby
> server. Replication will get delayed, and you have to keep all the WAL
> around for the standby to catch up once the query is done, but it should
> work.
> You'd set "max_streaming_standby_delay" to -1 on the standby.
>
>
We have the "Select query" running on a reader instance , but still the
writer instance was showing up "MaximumUsedTransactionIDs" reaching
1.5billion, so it means both the instance as part of same cluster so
sharing same XIDs, and as per your suggestion we should run this in
separate standby cluster altogether which does not share same XID. Is this
understanding correct? or it can be handled even with another reader
instance by just tweaking some other parameter so that they won't share the
same XID?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2024-05-23 08:15:41 Re: Long running query causing XID limit breach
Previous Message Laurenz Albe 2024-05-23 08:04:06 Re: Json table/column design question