relfrozenxid not getting reset even after manual VACUUM

From: Armand du Plessis <adp(at)bank(dot)io>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: relfrozenxid not getting reset even after manual VACUUM
Date: 2013-08-05 16:00:48
Message-ID: CANf99sXffDjjeE+dH-YAkQF26bf6BAWJQ4b3aqwtKNU0jEr30g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi there,

We're running into a scenario where despite doing a manual vacuum as a
superuser the relfrozenxid for one relation now dangerously close to
wraparound is not getting reset.

It's a Postgres 9.2.3 cluster. We shutdown other access to the machine
while running the VACUUM to ensure it could complete quick enough with an
aggressive vacuum (vacuum_cost_limit 10000 and no delay). The previous
autovacuum was running for days without completing.

There's also no old transactions in either pg_prepared_xacts or
pg_stat_activity.

production=# SELECT relname, age(relfrozenxid) as xid_age FROM pg_class
WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 10;
relname | xid_age
-------------------+------------
messages | 2050996318

(This is a similar scenario to one I've posted about a while back (
http://www.postgresql.org/message-id/3238.1369055503@sss.pgh.pa.us) - In
that case I advanced the autovacuum_freeze_max_age with the intention to
replace the problematic table. That hasn't happened in time and now sitting
with a real problem)

Below is the vacuum settings at the time of the manual vacuum:

name | setting
---------------------------------+------------
autovacuum | on
autovacuum_analyze_scale_factor | 0.01
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 1750000000
autovacuum_max_workers | 2
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 50
autovacuum_vacuum_cost_limit | 200
autovacuum_vacuum_scale_factor | 2
autovacuum_vacuum_threshold | 50
log_autovacuum_min_duration | 0
vacuum_cost_delay | 0
vacuum_cost_limit | 10000
vacuum_cost_page_dirty | 20
vacuum_cost_page_hit | 1
vacuum_cost_page_miss | 10
vacuum_defer_cleanup_age | 0
vacuum_freeze_min_age | 25000000
vacuum_freeze_table_age | 150000000

The tail-end of the vacuum log:

INFO: index "message_sender_client_ref_confirmation_index" now contains
529204260 row versions in 10752961 pages
DETAIL: 261675128 index row versions were removed.
722622 index pages have been deleted, 367793 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.24 sec.
INFO: index "messages_v2_recipient_sender" now contains 529194622 row
versions in 2632966 pages
DETAIL: 109563299 index row versions were removed.
216413 index pages have been deleted, 132791 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: index "messages_v6_recipient_created_at_type" now contains 91840693
row versions in 2312775 pages
DETAIL: 177088781 index row versions were removed.
209178 index pages have been deleted, 3045 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO: index "messages_v6_sender_created_at_type" now contains 91840716 row
versions in 2272370 pages
DETAIL: 175741424 index row versions were removed.
201936 index pages have been deleted, 2958 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO: index "messages_v6_unread_messages_count" now contains 4797806 row
versions in 29930 pages
DETAIL: 1808779 index row versions were removed.
77 index pages have been deleted, 5 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "messages": found 51093697 removable, 529671512 nonremovable row
versions in 22666431 out of 22666435 pages
DETAIL: 701995 dead row versions cannot be removed yet.
There were 716893565 unused item pointers.
0 pages are entirely empty.
CPU 1229.92s/4618.87u sec elapsed 13951.33 sec.
INFO: vacuuming "pg_toast.pg_toast_18369"
INFO: index "pg_toast_18369_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_18369": found 0 removable, 0 nonremovable row versions in
0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

If I look at the pg_stat_user_tables log I can see it's cleaned up the dead
tuples successfully:

production=# select relname, n_live_tup, n_dead_tup, last_vacuum,
last_autovacuum from pg_stat_user_tables;
relname | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum
-------------------------------+------------+------------+-------------------------------+-------------------------------
messages | 529790266 | 2046126 | 2013-08-05
13:56:07.794664+00 |

I've got a ton of entries like the below in the log since we're approaching
the end of the world:

013-08-05 15:15:07.588 UTC,,,30352,,51ffc17b.7690,2,,2013-08-05 15:15:07
UTC,9/358152441,3787657047,DEBUG,00000,"transaction ID wrap limit is
3884757767, limited by database with OID 17671",,,,,,,,,""

Doing a VACUUM on any other relation resets the XID correctly.

Any suggestions on how we can resolve this before we get shutdown? I
suspect even in shutdown mode we'll have this issue unless we can track
down the cause.

Kind regards,

Armand

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jorge Torralba 2013-08-05 17:29:43 Simple question on fail over and config files
Previous Message Robert Burgholzer 2013-08-05 13:39:25 Re: unexpected EOF on client connection during pg_dumpall