From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Single pass vacuum - take 1 |
Date: | 2011-07-21 16:17:19 |
Message-ID: | CA+TgmobCvz0XxmM-g_Wg=5VrkKqEB=mZ2G8hA7qvCjRQxFCsNQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> How does this interact with the visibility map? If you set the visibility
> map bit after vacuuming indexes, a subsequent vacuum will not visit the
> page. The second vacuum will update relindxvacxlogid/off, but it will not
> clean up the dead line pointers left behind by the first vacuum. Now the LSN
> on the page differs from the one stored in pg_class, so subsequent pruning
> will not remove the dead line pointers either.
Currently, I think we would only set the visibility map bit after
vacuuming the page for the second time. The patch as submitted
doesn't appear to go back and set visibility map bits after finishing
the index vacuum. Now, that might be nice to do, because then a
hypothetical index-only scan could start taking advantage of vacuum
having been done sooner. If we wanted to do that, we could
restructure the visibility map to store two bits per page: one to
indicate whether there is any potential work for VACUUM to do (modulo
freezing) and the other to indicate whether an index pointer could
possibly be aimed at a dead line pointer. (In fact, maybe we'd even
want to have a third bit to indicate "all tuples frozen", which would
be useful for optimizing anti-wraparound vacuum.)
> I think you can sidestep that
> if you check that the page's vacuum LSN <= vacuum LSN in pg_class, instead
> of equality.
I don't think that works, because the point of storing the LSN in
pg_class is to verify that the vacuum completed the index cleanup
without error. The fact that a newer vacuum accomplished that goal
does not mean that all older ones did.
> Ignoring the issue stated in previous paragraph, I think you wouldn't
> actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
> doesn't matter. In fact, a single bit would be enough. After a successful
> vacuum, the counter on each heap page (with dead line pointers) is N, and
> the value in pg_class is N. There are no other values on the heap, because
> vacuum will have cleaned them up. When you begin the next vacuum, it will
> stamp pages with N+1. So at any stage, there is only one of two values on
> any page, so a single bit is enough. (But as I said, that doesn't hold if
> vacuum skips some pages thanks to the visibility map)
If this can be made to work, it's a very appealing idea. The patch as
submitted uses lp_off to store a single bit, to distinguish between
vacuum and dead-vacuumed, but we could actually have (for greater
safety and debuggability) a 15-byte counter that just wraps around
from 32,767 to 1. (Maybe it would be wise to reserve a few counter
values, or a few bits, or both, for future projects.) That would
eliminate the need to touch PageRepairFragmentation() or use the
special space, since all the information would be in the line pointer
itself. Not having to rearrange the page to reclaim dead line
pointers is appealing, too.
> Is there something in place to make sure that pruning uses an up-to-date
> relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date,
> you'll just miss the opportunity to remove some dead tuples.
This seems like a tricky problem, because it could cause us to
repeatedly fail to remove the same dead line pointers, which would be
poor. We could do something like this: after updating pg_class,
vacuum send an interrupt to any backend which holds RowExclusiveLock
or higher on that relation. The interrupt handler just sets a flag.
If that backend does heap_page_prune() and sees the flag set, it knows
that it needs to recheck pg_class. This is a bit grotty and doesn't
completely close the race condition (the signal might not arrive in
time), but it ought to make it narrow enough not to matter in
practice.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-07-21 16:19:59 | Re: fixing PQsetvalue() |
Previous Message | Kevin Grittner | 2011-07-21 16:16:28 | Re: sinval synchronization considered harmful |