Could not finish anti-wraparound VACUUM when stop limit is reached

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Could not finish anti-wraparound VACUUM when stop limit is reached
Date: 2014-05-25 15:40:09
Message-ID: 53820ED9.3010003@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While debugging the B-tree bug that Jeff Janes reported
(http://www.postgresql.org/message-id/CAMkU=1y=VwF07Ay+Cpqk_7FpiHRctmssV9y99SBGhitkXPbf8g@mail.gmail.com)
a new issue came up:

If you reach the xidStopLimit, and try to run VACUUM, it fails with error:

jjanes=# vacuum;
ERROR: database is not accepting commands to avoid wraparound data loss
in database "jjanes"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

The backtrace looks like this:

#0 errstart (elevel=20, filename=0x9590a0 "varsup.c", lineno=120,
funcname=0x9593f0 <__func__.10455> "GetNewTransactionId",
domain=0x0) at elog.c:249
#1 0x00000000004f7c14 in GetNewTransactionId (isSubXact=0 '\000') at
varsup.c:115
#2 0x00000000004f86db in AssignTransactionId (s=0xd62900
<TopTransactionStateData>)
at xact.c:510
#3 0x00000000004f84a4 in GetCurrentTransactionId () at xact.c:382
#4 0x000000000062dc1c in vac_truncate_clog (frozenXID=1493663893,
minMulti=1)
at vacuum.c:909
#5 0x000000000062dc06 in vac_update_datfrozenxid () at vacuum.c:888
#6 0x000000000062cdf6 in vacuum (vacstmt=0x29e05e0, relid=0, do_toast=1
'\001',
bstrategy=0x2a5cc38, for_wraparound=0 '\000', isTopLevel=1 '\001')
at vacuum.c:294
#7 0x00000000007a3c55 in standard_ProcessUtility (parsetree=0x29e05e0,
queryString=0x29dfbf8 "vacuum ;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,
dest=0x29e0988, completionTag=0x7fff9411a490 "") at utility.c:645

So, vac_truncate_clog() tries to get a new transaction ID, which fails
because we've already reached the stop-limit. vac_truncate_clog()
doesn't really need a new XID to be assigned, though, it only uses it to
compare against datfrozenxid to see if wrap-around has already happened,
so it could use ReadNewTransactionId() instead.

Jeff's database seems to have wrapped around already, because after
fixing the above, I get this:

jjanes=# vacuum;
WARNING: some databases have not been vacuumed in over 2 billion
transactions
DETAIL: You might have already suffered transaction-wraparound data loss.
VACUUM

We do not truncate clog when wraparound has already happened, so we
never get past that point. Jeff advanced XID counter aggressively with
some custom C code, so hitting the actual wrap-around is a case of
"don't do that". Still, the case is quite peculiar: pg_controldata says
that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that,
1593661139. So ISTM he managed to not just wrap around, but execute 2
billion more transactions after the wraparound and reach datfrozenxid
again. I'm not sure how that happened.

- Heikki

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-25 15:45:11 Re: pg9.4b1: unhelpful error message when creating a collation
Previous Message Heikki Linnakangas 2014-05-25 14:16:29 Re: 9.4 btree index corruption