From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unexpected VACUUM FULL failure |
Date: | 2007-08-09 03:23:13 |
Message-ID: | 25258.1186629793@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> ... Since we've whacked the tqual.c logic around recently,
> the problem might actually lie there...
In fact, I bet this is a result of the async-commit patch. The places
where vacuum.c bleats "HEAP_MOVED_OFF was expected" are all places where
it is looking at a tuple not marked XMIN_COMMITTED; it expects that
after its first pass over the table, *every* tuple is either
XMIN_COMMITTED or one that it moved. Async commit changed tqual.c
so that tuples that are in fact known committed might not get marked
XMIN_COMMITTED right away. The patch tries to prevent this from
happening within VACUUM FULL by means of
/*
* VACUUM FULL assumes that all tuple states are well-known prior to
* moving tuples around --- see comment "known dead" in repair_frag(),
* as well as simplifications in tqual.c. So before we start we must
* ensure that any asynchronously-committed transactions with changes
* against this table have been flushed to disk. It's sufficient to do
* this once after we've acquired AccessExclusiveLock.
*/
XLogAsyncCommitFlush();
but I bet lunch that that's not good enough. I still haven't reproduced
it, but I'm thinking that the inexact bookkeeping that we created for
clog page LSNs allows tuples to not get marked if the right sort of
timing of concurrent transactions happens.
Not sure about the best solution for this.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jaime Casanova | 2007-08-09 04:51:39 | Re: Function structure in formatting.c |
Previous Message | Brendan Jurd | 2007-08-09 02:43:06 | Re: Function structure in formatting.c |