From: | Fabio Pardi <f(dot)pardi(at)portavita(dot)eu> |
---|---|
To: | Subramanian Krishnan <subramanian(dot)kris(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Finding out the aging tables for vacuuming to prevent data loss on transaction id wraparound |
Date: | 2018-03-06 13:26:58 |
Message-ID: | 54b3074f-096e-fe83-a503-e35361d85553@portavita.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
From my experience is good practice to:
- Monitor the age of your records (we use icinga/nagios for that) to know on time when things are going wrong
- Schedule a periodic VACUUM of your db, if possible. Eg: if your db is heavily used during the day but resting at night, then during the night is a good idea to run a VACUUM. This will also offload your db during the day, because less autovacuum jobs will be triggered.
As additional note, if I remember correctly, a vacuum will be triggered to prevent wraparound also if you set 'autovacuum=off' in your configuration. That specific vacuum will have priority over normal vacuums.
For statistical purposes, a good formula to understand your actual 'burn rate' is to periodically run:
SELECT * FROM txid_current();
and compare results over time to have a good estimate on how many xid you are using daily.
Regards,
Fabio
On 03/06/2018 12:54 PM, Subramanian Krishnan wrote:
> Hello Laurenz,
>
> Thank you for taking the time out to read through and problem description and responding back.
>
> After posting the question on the forum, I continued investigating possible root causes. Through a flash of intuition I suspected if it could be a table which is strictly not a part of the database (not listed in pg_class) which could be contributing factor.
>
> pg_shdepend fitted the bill. So I did a vacuum on pg_catalog.pg_shdepend and post that re-ran the DB age query. The trick worked and the age came down to match the max age entry in pg_class. To be on the safer side, I did vacuuming of pg_shdescription and pg_shseclabel as well.
>
> Your suggestion of:
> 1) Making AV more aggressive and
> 2) Tiding over the present problem of aged TOAST tables though manual/scripted VACUUM is valid and we are working on that currently.
>
> Thanks and Regards,
> Subu
>
> On Mon, Mar 5, 2018 at 2:54 PM, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at <mailto:laurenz(dot)albe(at)cybertec(dot)at>> wrote:
>
> Subramanian Krishnan wrote:
> > We are running PostgreSQL 9.4.14 and recently we received an alert for transaction id
> > crossing the 1 billion mark. Since we can hit the transaction id wraparound issue for aging tables,
> > we investigated by running the following queries:
> >
> > mpsdb=> select * from txid_current();
> > txid_current
> > --------------
> > 1100826671
> > (1 row)
> >
> > mpsdb=> SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC LIMIT 20;
> > datname | age
> > -----------+------------
> > mpsdb | 1087909867
> > rdsadmin | 1000562517
> > postgres | 950697143
> > template0 | 900709601
> > template1 | 900697643
> > (5 rows)
> >
> > mpsdb=> SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
> > table_name | type | age | relfrozenxid
> > -------------------+------+------------+--------------
> > pg_toast_22985917 | t | 1042877986 | 57941433
> > pg_toast_22986301 | t | 1042877877 | 57941542
> > pg_toast_23823161 | t | 1041791836 | 59027583
> > pg_toast_23823242 | t | 1041791776 | 59027643
> > pg_toast_23904594 | t | 1041658236 | 59161183
> > pg_toast_23904629 | t | 1041658221 | 59161198
> > pg_toast_23904640 | t | 1041658208 | 59161211
> > pg_toast_23904648 | t | 1041658200 | 59161219
> > pg_toast_23904658 | t | 1041658191 | 59161228
> > pg_toast_23904666 | t | 1041658183 | 59161236
> > pg_toast_23904674 | t | 1041658175 | 59161244
> > pg_toast_23904682 | t | 1041658169 | 59161250
> > pg_toast_23904690 | t | 1041658161 | 59161258
> > pg_toast_23904698 | t | 1041658156 | 59161263
> > pg_toast_23904706 | t | 1041658149 | 59161270
> > pg_toast_23904714 | t | 1041658129 | 59161290
> > pg_toast_23904722 | t | 1041658096 | 59161323
> > pg_toast_23904730 | t | 1041658090 | 59161329
> > pg_toast_23904738 | t | 1041658085 | 59161334
> > pg_toast_23904746 | t | 1041658077 | 59161342
> > (20 rows)
> >
> >
> > mpsdb=> SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) like '%VACUUM%' ORDER BY xact_start;
> > datname | usename | pid | waiting | xact_runtime | query
> > ---------+----------+-------+---------+-----------------+------------------------------------------------------------------------
> > mpsdb | rdsadmin | 13833 | f | 00:00:14.161262 | autovacuum: VACUUM pg_toast.pg_toast_242649675 (to prevent wraparound)
> > mpsdb | rdsadmin | 14565 | f | 00:00:11.735571 | autovacuum: VACUUM pg_toast.pg_toast_242649757 (to prevent wraparound)
> > mpsdb | rdsadmin | 15315 | f | 00:00:09.303762 | autovacuum: VACUUM pg_toast.pg_toast_242649819 (to prevent wraparound)
> > mpsdb | rdsadmin | 14935 | f | 00:00:06.893078 | autovacuum: VACUUM pg_toast.pg_toast_243226657 (to prevent wraparound)
> > mpsdb | rdsadmin | 15851 | f | 00:00:04.322474 | autovacuum: VACUUM pg_toast.pg_toast_243227582 (to prevent wraparound)
> > mpsdb | rdsadmin | 15615 | f | 00:00:01.768495 | autovacuum: VACUUM pg_toast.pg_toast_243226332 (to prevent wraparound)
> >
> > It is clear that we need to vacuum the most aged pg_toast tables which are not yet picked up by the autovacuum process.
> >
> > The question which is bothering us though is:
> > As per the queries we ran the most aged transaction and table is 1042877986 (pg_toast_22985917).
> > And as per PostgreSQL pg_database documentation the most aged table/transaction in pg_class determines
> > age of the database (via the datfrozenxid value). If that is the case we would have expected the age
> > of mpsdb database to be 1042877986 (or in this range) but what we see as age is 1087909867.
> >
> > Does this mean there are other aging tables we are not aware of? If yes, how do we get to those
> > tables since we have already queried pg_class for the most aged table?
> >
> > Request any help/guidance we can get regarding this.
>
> That is indeed strange.
>
> Reading the code, I see that at the end of a VACUUM, "datfrozenxid" is set
> to the minimum value of all "relfrozenxid" for all tables in the database
> obtained with a sequential scan of "pg_class".
>
> The only exception is that if a table is found where "relfrozenxid" is in the
> future, nothing is done. The comment suggests that that should normally not
> happen, but "has been known to arise due to bugs in pg_upgrade".
>
> - Can you find tables with negative "relfrozenxid" age in "pg_class"?
>
> - The other option is that a VACUUM finished between the query of "pg_database"
> and the query of "pg_class". Is that an option?
>
> You should tune autovacuum to be more aggressive so that it gets done processing
> the tables. To get rid of the immediate problem, you could schedule a manual
> VACUUM of the tables to which the TOAST table belong.
>
> Yours,
> Laurenz Albe
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2018-03-06 13:39:11 | Re: future of pgaudit |
Previous Message | Achilleas Mantzios | 2018-03-06 13:17:38 | Re: future of pgaudit |