I have a fairly large database(approx. 1.5TB) that is backed up by a warm standby database using log shipping(PITR). This setup had been running for a couple of months when I ran into a problem on the primary DB and had to failover to the standby DB. This worked as expected.
Shortly thereafter(Sometime over the long weekend of course), Postgres shutdown the database to avoid XID wraparound data loss. I presume there were warnings in the log about running out of XIDs, but nobody noticed in time and given what transpired after that I don't think it would have mattered if they had.
As per the documentation, I started the DB in single user mode and attempted to do a full database vacuum. After this ran for about 12hours the pg_xlog directory ran out of disk space. I'm not sure I understand why anything is written to pg_xlog as part of the vacuum process, perhaps someone can enlighten me.
I next started looking at the age(refrozenxid) of the tables in my DB, and was surprised to see that over 4000 of the 5000 tables in this DB had an age over 2Billion. So thats 4000 tables representing over a terabyte of data that need to be vacuumed! I am now vacuuming those tables one at a time, which is taking a long time(This is a scripted process). So there is no way I could have vacuumed the tables quickly enough even given a warning of impending XID wraparound.
Looking through the support mailing lists(Bugs) I see some discussion about the frozenxid updates on the master not being propogated to the slave through the WAL logs, and comments from Tom, Alvaro and Heikki suggesting that they were looking into a solution for PG 8.3 and needed a way around the problem in PG 8.2.
I am currently running PG 8.2.4 on FreeBSD.
So my questions are:
1) What is the recommended way to either solve or get around this problem in PG 8.2.4?
2) Is this "problem" fixed in some more current version of Postgres? I didn't see any mention of it in release notes up to PG 8.3.3?
3) Does this mean that if you are trying to use a warm standby DB with PITR, you need to make a new base backup of your primary DB every 1.5billion transactions, or there abouts, to avoid the problem. If so, I think this should be documented in the "Caveats" section of "Continuous Archiving and Point-in-time-recovery(PITR)" section of the manual.
Regards...
Mark Sherwood
_________________________________________________________________