From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Open issues for HOT patch |
Date: | 2007-09-18 16:10:28 |
Message-ID: | 5349.1190131828@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> * The patch makes undocumented changes that cause autovacuum's decisions
> to be driven by total estimated dead space rather than total number of
> dead tuples. Do we like this?
No one seems to have picked up on this point, but after reflection
I think there's actually a pretty big problem here. Per-page pruning
is perfectly capable of keeping dead space in check. In a system with
HOT running well, the reasons to vacuum a table will be:
1. Remove dead index entries.
2. Remove LP_DEAD line pointers.
3. Truncate off no-longer-used end pages.
4. Transfer knowledge about free space into FSM.
Pruning cannot accomplish #1, #2, or #3, and without significant changes
in the FSM infrastructure it has no hope about #4 either. What I'm
afraid of is that steady page-level pruning will keep the amount of dead
space low, causing autovacuum never to fire, causing the indexes to
bloat indefinitely because of #1 and the table itself to bloat
indefinitely because of #2 and #4. Thus, the proposed change in
autovacuum seems badly misguided: instead of making autovacuum trigger
on things that only it can fix, it makes autovacuum trigger on something
that per-page pruning can deal with perfectly well.
I'm inclined to think that we should continue to drive autovac off a
count of dead rows, as this is directly related to points #1 and #2,
and doesn't seem any worse for #3 and #4 than an estimate based on space
would be. Possibly it would be sensible for per-page pruning to report
a reduction in number of dead rows when it removes heap-only tuples,
but I'm not entirely sure --- any thoughts?
If we do this, then it's not clear that having pgstats track dead space
is worth the trouble at all. It might possibly be of value for testing
purposes to see how well pruning is doing, but I'm unconvinced that it's
worth bloating stats messages and files to have this number in a
production system. An alternative that would serve as well for testing
would be to teach contrib/pgstattuple to measure dead space.
Comments?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2007-09-18 16:31:03 | Re: Open issues for HOT patch |
Previous Message | Pavan Deolasee | 2007-09-18 15:58:55 | Re: Open issues for HOT patch |