From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Incomplete freezing when truncating a relation during vacuum |
Date: | 2013-11-26 12:32:44 |
Message-ID: | 20131126123244.GI23284@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
A customer of ours reported that some columns were missing from
pg_attribute. Investigation showed that they were visible to sequential
but not index scans. Looking closer, the page with the missing
attributes is marked as all-visible, but the xids on the individual rows
were xids more than 2^31 in the past - so, effectively in the future and
invisible.
pg_class.relfrozenxid, pg_database.datfrozenxid looked perfectly normal,
not indicating any wraparound and were well past the xid in the
particular rows.
So evidently, something around freezing has gone wrong. We've haven't
frozen each row, but updated relfrozenxid regardless.
A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
/* Do the vacuuming */
lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
...
if (whatever)
lazy_truncate_heap(onerel, vacrelstats);
...
new_frozen_xid = FreezeLimit;
if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
new_frozen_xid = InvalidTransactionId;
but lazy_tuncate_heap() does, after it's finished truncating:
vacrelstats->rel_pages = new_rel_pages;
Which means, we might consider a partial vacuum as a vacuum that has
frozen all old rows if just enough pages have been truncated away.
This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.
This is somewhat nasty :(.
I am not sure how we could fixup the resulting corruption. In some cases
we can check for page-level all-visible bit and fixup the the individual
xids. But it's not guaranteed, although likely, that the page level all
visible bit has been set...
Comments?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2013-11-26 12:34:50 | Re: PL/Python: domain over array support |
Previous Message | Marko Tiikkaja | 2013-11-26 12:23:14 | Re: psql shows line number |