Re: Long running query causing XID limit breach

From: David HJ <chuxiongzhong(at)gmail(dot)com>
To: sud <suds1434(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Long running query causing XID limit breach
Date: 2024-05-26 05:56:16
Message-ID: CAKabb9XsSKEzmYV+WKPptLFPVYbqrD_W8UJKiQqW5euyS2HZoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

anyone know how to describe from this mailing list?

On Thu, May 23, 2024 at 5:16 AM sud <suds1434(at)gmail(dot)com> wrote:

> Hello ,
> It's RDS postgres version 15.4. We suddenly saw the
> "MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team
> members who mentioned the database is going to be in shutdown/hung if this
> value reaches to ~2billion and won't be able to serve any incoming
> transactions. It was a panic situation.
>
> I have heard of it before , because of the way postgres works and the XID
> being a datatype of length 32 bit integer can only represent (2^32)/2=~2
> billion transactions. However, as RDS performs the auto vacuum , we thought
> that we need not worry about this issue. But it seems we were wrong. And we
> found one adhoc "SELECT '' query was running on the reader instance since
> the last couple of days and when that was killed, the max xid
> (MaximumUsedTransactionIDs) dropped to 50million immediately.
>
> So I have few questions,
>
> 1)This system is going to be a 24/7 up and running system which will
> process ~500million business transactions/day in future i.e. ~4-5billion
> rows/day inserted across multiple tables each day. And as I understand each
> row will have XID allocated. So in that case , does it mean that, we will
> need (5billion/24)=~200million XID/hour and thus , if any such legitimate
> application "SELECT" query keeps running for ~10 hours (and thus keep the
> historical XID alive) , then it can saturate the
> "MaximumUsedTransactionIDs" and make the database standstill in
> 2billion/200million=~10hrs. Is this understanding correct? Seems we are
> prone to hit this limit sooner going forward.
>
> 2)We have some legitimate cases where the reporting queries can run for
> 5-6hrs. So in such cases if the start of this SELECT query happen at 100th
> XID on table TAB1, then whatever transactions happen after that time,
> across all other tables(table2, table3 etc) in the database won't get
> vacuum until that SELECT query on table1 get vacuumed(as database will try
> to keep that same 100th XID image) and the XID will just keep incrementing
> for new transaction, eventually reaching the max limit. Is my understanding
> correct here?
>
> 3)Although RDS does the auto vacuum by default. but should we also
> consider doing manual vacuum without impacting ongoing transactions?
> Something as below options
> vacuum freeze tab1;
> vacuum freeze;
> vacuum;
> vacuum analyze tab1;
> vacuum tab1;
>
> 4)Had worked in past in oracle database where the similar transaction
> identifier is called as "system change number" , but never encountered that
> being exhausted and also there it used to have UNDO record and if a SELECT
> query needs anything beyond certain limit(set undo_retention parameter) the
> select query used to fail with snapshot too old error but not impacting any
> write transactions. But in postgres it seems nothing like that happens and
> every "Select query" will try to run till its completion without any such
> failure, until it gets skilled by someone. Is my understanding correct?
>
> And in that case, It seems we have to mandatorily set "statement_timeout"
> to some value e.g. 4hrs(also i am not seeing a way to set it for any
> specific user level, so it will be set for all queries including
> application level) and also "idle_in_transaction_session_timeout" to
> 5minutes, even on all the prod and non prod databases, to restrict the long
> running transactions/queries and avoid such issues in future. Correct me if
> I'm wrong.
>
> Regards
> Sud
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Torsten Förtsch 2024-05-26 08:13:28 Re: Long running query causing XID limit breach
Previous Message sud 2024-05-25 20:59:48 Re: Long running query causing XID limit breach