From: | Russell Smith <mr-russ(at)pws(dot)com(dot)au> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: VACUUM/ANALYZE counting of in-doubt tuples |
Date: | 2007-11-18 02:59:31 |
Message-ID: | 473FAA93.7000403@pws.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Please read the below is some skepticism. I am not an expert with
regard to statistics and vacuum internals. Hopefully it just keeps the
thinking caps moving.
Tom Lane wrote:
> There was some discussion in pgsql-performance about the problem that
> the live-and-dead-tuple counts that ANALYZE reports to the stats
> collector don't reflect insert-in-progress tuples properly:
> http://archives.postgresql.org/pgsql-performance/2007-11/msg00225.php
> I proposed a patch here:
> http://archives.postgresql.org/pgsql-patches/2007-11/msg00169.php
> to switch ANALYZE over to using HeapTupleSatisfiesVacuum() to determine
> tuple status, which would at least give it awareness of tuples that are
> in-doubt because of concurrent transactions.
>
> On further thought though, that's not the whole story, and in fact
> VACUUM itself isn't doing very well at accounting for in-doubt tuples.
> The current implementation is that whatever live and dead tuple totals
> are arrived at by a VACUUM or ANALYZE are sent to the stats collector
> and simply overwrite its counters on arrival. Meanwhile, the
> transaction that is responsible for an in-doubt tuple will send a
> stats message to increment either the live-tuple or dead-tuple count
> as appropriate when it commits or aborts. If that happens before
> the VACUUM or ANALYZE completes, the increment will get overwritten
> by VACUUM/ANALYZE's total; if afterwards, the increment will get
> added onto the total. So ideally we'd count the state change as already
> done if we knew the other transaction would commit first, otherwise
> not.
>
So given autovacuum starts on the basis of the stats collector. The
problem we are having is that one vacuum will be followed by a second
basically useless vacuum for long running insert transactions that commit.
> Since we don't know whether the other transaction will commit or abort,
> much less when, it seems hopeless to have any really accurate accounting
> for in-doubt tuples. We could avoid the whole problem if we gave up on
> having VACUUM/ANALYZE reset the stats counters (instead, VACUUM would
> just send an incremental dead-tuple-count reduction, and ANALYZE would
> do nothing); as indeed was already proposed by Itagaki-san. But I am
> still convinced that that cure would be worse than the disease --- if
> the counters are driven entirely by incremental updates then there is
> no way to correct errors that will inevitably creep in, and the
> accumulation of errors will allow the counts to diverge arbitrarily
> far from reality. I'm happier with the idea that the error in the
> counts can be bounded by the number of in-doubt tuples seen by the
> last VACUUM/ANALYZE.
>
> So having said all that, we need to devise heuristic rules that minimize
> the error. In that light, I suggest that the accounting rules assume
> that the transactions responsible for in-doubt tuples will finish after
> the VACUUM/ANALYZE finishes. That probably sounds wrong, because in
> most of the sorts of applications we think about, updating transactions
> are short and are likely to finish before the VACUUM/ANALYZE does. But
> what I'm thinking is that a short transaction by definition can't change
> very many rows. The cases that can seriously hurt the accuracy of our
> counts are long-running bulk inserts or updates, and for those it seems
> entirely likely that ANALYZE in particular could finish first.
>
This may be the case, but I'm going to throw in the usual where is your
evidence question. Saying it doesn't actually make it so. This makes
the importance of the heuristic rules and determining what they are very
important.
A potential though for measuring them, and possibly accounting for tuple
changes would be;
1. At the beginning of vacuum of a table, cache the live/dead tuple
statistics
2. At the end of vacuum of the table, compare current stats with cached
version from (1).
3. Wipe out stats row and put the vacuum values with the changes between
(1) and (2) included in the final stats row.
All transactions that committed during the vacuum run will be included
in the changes (1) -> (2). All transactions that rolled back will be in
the same boat. The problem in my logic is; If vacuum starts, we check
page X, a transaction alters page X and commits, transaction then update
stats, vacuum finished and alters the stats again. We have double
counting :( The goal is to come up with the best way to minimize size
of the count error that could be experienced.
I suppose it all gets back to the original point. What evidence do we
have for how big the problem is and what cases are we fixing by a
particular heuristic rule. I'm sure if it's a major problem we can
invent all sorts of complicated algorithms to track the stats correctly
and update them at the end of the vacuum. Is that worth the performance
hit and the trouble? If we adopt the wrong solution can we make the
situation worse for other types of workloads?
> Given that assumption, then the rule should be to not count
> INSERT_IN_PROGRESS tuples at all --- they will be added to the
> appropriate count when the inserting transaction commits or aborts.
> And DELETE_IN_PROGRESS tuples should be counted as live --- if the
> deleting transaction commits, it will increment the dead-tuple count
> at that time, and if it aborts then the tuple remains live.
>
> I feel fairly comfortable with this analysis for ANALYZE, and the
> patch I posted yesterday can easily be adjusted to accommodate it.
> However, what of VACUUM? As that code stands, every non-removable
> tuple (including RECENTLY_DEAD ones) is counted as live, and the
> dead-tuples count gets reset to zero. That seems clearly bogus.
> But the other-transaction-commits-second hypothesis seems a good bit
> more dubious for VACUUM than it is for ANALYZE.
>
> Should we attempt to adjust VACUUM's accounting as well, or leave it
> for 8.4? For that matter, should adjusting ANALYZE be left for 8.4?
> Thoughts?
>
Give all my unqualified statements above, I'd be tempted to make sure we
can measure the problem with 8.3 and get a proper solution into 8.4.
Analyze seems safe to change now. It doesn't actually make change to
the live/dead tuple counts (I don't think) so is a lower risk. I feel
uneasy about the vacuum stuff as we don't know the exact side effects a
change like that could have. It could have a larger impact that
suspected. And at beta3/rc1 I feel it's too late in the cycle.
Regards
Russell Smith
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-11-18 03:06:07 | Re: [COMMITTERS] pgsql: update files for beta3 |
Previous Message | Tom Lane | 2007-11-17 20:14:04 | Re: offtopic, historic, in 8.1, how to find relfrozenxid by table? |