Re: Finding out the aging tables for vacuuming to prevent data loss on transaction id wraparound

From: Fabio Pardi <f(dot)pardi(at)portavita(dot)eu>
To: Subramanian Krishnan <subramanian(dot)kris(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Finding out the aging tables for vacuuming to prevent data loss on transaction id wraparound
Date: 2018-03-06 13:26:58
Message-ID: 54b3074f-096e-fe83-a503-e35361d85553@portavita.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

From my experience is good practice to:

- Monitor the age of your records (we use icinga/nagios for that) to know on time when things are going wrong

- Schedule a periodic VACUUM of your db, if possible. Eg: if your db is heavily used during the day but resting at night, then during the night is a good idea to run a VACUUM. This will also offload your db during the day, because less autovacuum jobs will be triggered.

As additional note, if I remember correctly, a vacuum will be triggered to prevent wraparound also if you set 'autovacuum=off' in your configuration. That specific vacuum will have priority over normal vacuums.

For statistical purposes, a good formula to understand your actual 'burn rate' is to periodically run:

SELECT * FROM txid_current();

and compare results over time to have a good estimate on how many xid you are using daily.

Regards,

Fabio

On 03/06/2018 12:54 PM, Subramanian Krishnan wrote:
> Hello Laurenz,
>
> Thank you for taking the time out to read through and problem description and responding back.
>
> After posting the question on the forum, I continued investigating possible root causes.  Through a flash of intuition I suspected if it could be a table which is strictly not a part of the database (not listed in pg_class) which could be contributing factor.
>
> pg_shdepend fitted the bill. So I did a vacuum on pg_catalog.pg_shdepend and post that re-ran the DB age query. The trick worked and the age came down to match the max age entry in pg_class. To be on the safer side, I did vacuuming of pg_shdescription and pg_shseclabel as well.
>
> Your suggestion of:
> 1) Making AV more aggressive and
> 2) Tiding over the present problem of aged TOAST tables though manual/scripted VACUUM is valid and we are working on that currently.
>
> Thanks and Regards,
> Subu
>
> On Mon, Mar 5, 2018 at 2:54 PM, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at <mailto:laurenz(dot)albe(at)cybertec(dot)at>> wrote:
>
> Subramanian Krishnan wrote:
> > We are running PostgreSQL 9.4.14 and recently we received an alert for transaction id
> > crossing the 1 billion mark. Since we can hit the transaction id wraparound issue for aging tables,
> > we investigated by running the following queries:
> >
> > mpsdb=> select * from txid_current();
> >  txid_current
> > --------------
> >    1100826671
> > (1 row)
> >
> > mpsdb=> SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC LIMIT 20;
> >   datname  |    age
> > -----------+------------
> >  mpsdb     | 1087909867
> >  rdsadmin  | 1000562517
> >  postgres  |  950697143
> >  template0 |  900709601
> >  template1 |  900697643
> > (5 rows)
> >
> > mpsdb=> SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
> >     table_name     | type |    age     | relfrozenxid
> > -------------------+------+------------+--------------
> >  pg_toast_22985917 | t    | 1042877986 |     57941433
> >  pg_toast_22986301 | t    | 1042877877 |     57941542
> >  pg_toast_23823161 | t    | 1041791836 |     59027583
> >  pg_toast_23823242 | t    | 1041791776 |     59027643
> >  pg_toast_23904594 | t    | 1041658236 |     59161183
> >  pg_toast_23904629 | t    | 1041658221 |     59161198
> >  pg_toast_23904640 | t    | 1041658208 |     59161211
> >  pg_toast_23904648 | t    | 1041658200 |     59161219
> >  pg_toast_23904658 | t    | 1041658191 |     59161228
> >  pg_toast_23904666 | t    | 1041658183 |     59161236
> >  pg_toast_23904674 | t    | 1041658175 |     59161244
> >  pg_toast_23904682 | t    | 1041658169 |     59161250
> >  pg_toast_23904690 | t    | 1041658161 |     59161258
> >  pg_toast_23904698 | t    | 1041658156 |     59161263
> >  pg_toast_23904706 | t    | 1041658149 |     59161270
> >  pg_toast_23904714 | t    | 1041658129 |     59161290
> >  pg_toast_23904722 | t    | 1041658096 |     59161323
> >  pg_toast_23904730 | t    | 1041658090 |     59161329
> >  pg_toast_23904738 | t    | 1041658085 |     59161334
> >  pg_toast_23904746 | t    | 1041658077 |     59161342
> > (20 rows)
> >
> >
> > mpsdb=> SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) like '%VACUUM%' ORDER BY xact_start;
> > datname | usename  |  pid  | waiting |  xact_runtime   |                                 query
> > ---------+----------+-------+---------+-----------------+------------------------------------------------------------------------
> >  mpsdb   | rdsadmin | 13833 | f       | 00:00:14.161262 | autovacuum: VACUUM pg_toast.pg_toast_242649675 (to prevent wraparound)
> >  mpsdb   | rdsadmin | 14565 | f       | 00:00:11.735571 | autovacuum: VACUUM pg_toast.pg_toast_242649757 (to prevent wraparound)
> >  mpsdb   | rdsadmin | 15315 | f       | 00:00:09.303762 | autovacuum: VACUUM pg_toast.pg_toast_242649819 (to prevent wraparound)
> >  mpsdb   | rdsadmin | 14935 | f       | 00:00:06.893078 | autovacuum: VACUUM pg_toast.pg_toast_243226657 (to prevent wraparound)
> >  mpsdb   | rdsadmin | 15851 | f       | 00:00:04.322474 | autovacuum: VACUUM pg_toast.pg_toast_243227582 (to prevent wraparound)
> >  mpsdb   | rdsadmin | 15615 | f       | 00:00:01.768495 | autovacuum: VACUUM pg_toast.pg_toast_243226332 (to prevent wraparound)
> >
> > It is clear that we need to vacuum the most aged pg_toast tables which are not yet picked up by the autovacuum process.
> >
> > The question which is bothering us though is:
> > As per the queries we ran the most aged transaction and table is 1042877986 (pg_toast_22985917).
> > And as per PostgreSQL pg_database documentation the most aged table/transaction in pg_class determines
> > age of the database (via the datfrozenxid value). If that is the case we would have expected the age
> > of mpsdb database to be 1042877986 (or in this range) but what we see as age is 1087909867.
> >
> > Does this mean there are other aging tables we are not aware of? If yes, how do we get to those
> > tables since we have already queried pg_class for the most aged table?
> >
> > Request any help/guidance we can get regarding this.
>
> That is indeed strange.
>
> Reading the code, I see that at the end of a VACUUM, "datfrozenxid" is set
> to the minimum value of all "relfrozenxid" for all tables in the database
> obtained with a sequential scan of "pg_class".
>
> The only exception is that if a table is found where "relfrozenxid" is in the
> future, nothing is done.  The comment suggests that that should normally not
> happen, but "has been known to arise due to bugs in pg_upgrade".
>
> - Can you find tables with negative "relfrozenxid" age in "pg_class"?
>
> - The other option is that a VACUUM finished between the query of "pg_database"
>   and the query of "pg_class".  Is that an option?
>
> You should tune autovacuum to be more aggressive so that it gets done processing
> the tables.  To get rid of the immediate problem, you could schedule a manual
> VACUUM of the tables to which the TOAST table belong.
>
> Yours,
> Laurenz Albe
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Simon Riggs 2018-03-06 13:39:11 Re: future of pgaudit
Previous Message Achilleas Mantzios 2018-03-06 13:17:38 Re: future of pgaudit