Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-18 20:19:24
Message-ID: 20230118201924.lw7btds5twzkaogz@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-17 12:08:01 -0800, Peter Geoghegan wrote:
> > I think that's not the fault of relfrozenxid as a trigger, but that we simply
> > don't keep enough other stats. We should imo at least keep track of:
>
> If you assume that there is chronic undercounting of dead tuples
> (which I think is very common), then of course anything that triggers
> vacuuming is going to help with that problem -- it might be totally
> inadequate, but still make the critical difference by not allowing the
> system to become completely destabilized. I absolutely accept that
> users that are relying on that exist, and that those users ought to
> not have things get even worse -- I'm pragmatic. But overall, what we
> should be doing is fixing the real problem, which is that the dead
> tuples accounting is deeply flawed. Actually, it's not just that the
> statistics are flat out wrong; the whole model is flat-out wrong.

I think that depends on what "whole model" encompasses...

> The assumptions that work well for optimizer statistics quite simply
> do not apply here. Random sampling for this is just wrong, because
> we're not dealing with something that follows a distribution that can
> be characterized with a sufficiently large sample. With optimizer
> statistics, the entire contents of the table is itself a sample taken
> from the wider world -- so even very stale statistics can work quite
> well (assuming that the schema is well normalized). Whereas the
> autovacuum dead tuples stuff is characterized by constant change. I
> mean of course it is -- that's the whole point! The central limit
> theorem obviously just doesn't work for something like this -- we
> cannot generalize from a sample, at all.

If we were to stop dropping stats after crashes, I think we likely could
afford to stop messing with dead tuple stats during analyze. Right now it's
valuable to some degree because it's a way to reaosonably quickly recover from
lost stats.

The main way to collect inserted / dead tuple info for autovacuum's benefit is
via the stats collected when making changes.

We probably ought to simply not update dead tuples after analyze if the stats
entry has information about a prior [auto]vacuum. Or at least split the
fields.

> How many dead heap-only tuples are equivalent to one LP_DEAD item?
> What about page-level concentrations, and the implication for
> line-pointer bloat? I don't have a good answer to any of these
> questions myself. And I have my doubts that there are *any* good
> answers.

Hence my suggestion to track several of these via page level stats. In the big
picture it doesn't really matter that much whether there's 10 or 100 (dead
tuples|items) on a page that needs to be removed. It matters that the page
needs to be processed.

> Even these questions are the wrong questions (they're just less wrong).

I don't agree. Nothing is going to be perfect, but you're not going to be able
to do sensible vacuum scheduling without some stats, and it's fine if those
are an approximation, as long as the approximation makes some sense.

> I'd like to use the visibility map more for stuff here, too. It is
> totally accurate about all-visible/all-frozen pages, so many of my
> complaints about statistics don't really apply. Or need not apply, at
> least. If 95% of a table's pages are all-frozen in the VM, then of
> course it's pretty unlikely to be the right time to VACUUM the table
> if it's to clean up bloat -- this is just about the most reliable
> information we have access to.

I think querying that from stats is too expensive for most things. I suggested
tracking all-frozen in pg_class. Perhaps we should also track when pages are
*removed* from the VM in pgstats, I don't think we do today. That should give
a decent picture?

> > > This sounds like a great argument in favor of suspend-and-resume as a
> > > way of handling autocancellation -- no useful work needs to be thrown
> > > away for AV to yield for a minute or two.
>
> > Hm, that seems a lot of work. Without having held a lock you don't even know
> > whether your old dead items still apply. Of course it'd improve the situation
> > substantially, if we could get it.
>
> I don't think it's all that much work, once the visibility map
> snapshot infrastructure is there.
>
> Why wouldn't your old dead items still apply?

Well, for one the table could have been rewritten. Of course we can add the
code to deal with that, but it is definitely something to be aware of. There
might also be some oddities around indexes getting added / removed.

> > > Yeah, that's pretty bad. Maybe DROP TABLE and TRUNCATE should be
> > > special cases? Maybe they should always be able to auto cancel an
> > > autovacuum?
> >
> > Yea, I think so. It's not obvious how to best pass down that knowledge into
> > ProcSleep(). It'd have to be in the LOCALLOCK, I think. Looks like the best
> > way would be to change LockAcquireExtended() to get a flags argument instead
> > of reportMemoryError, and then we could add LOCK_ACQUIRE_INTENT_DROP &
> > LOCK_ACQUIRE_INTENT_TRUNCATE or such. Then the same for
> > RangeVarGetRelidExtended(). It already "customizes" how to lock based on RVR*
> > flags.
>
> It would be tricky, but still relatively straightforward compared to
> other things. It is often a TRUNCATE or a DROP TABLE, and we have
> nothing to lose and everything to gain by changing the rules for
> those.

Probably should also change the rules for VACUUM and VACUUM FULL / CLUSTER, if
we do it. Manual VACUUM will often be faster due to the cost limits, and
VACUUM FULL can be *considerably* faster than VACUUM once you hit bad bloat.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2023-01-18 20:26:43 Re: Non-superuser subscription owners
Previous Message Peter Geoghegan 2023-01-18 20:15:17 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation