Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date: 2021-12-07 19:13:15
Message-ID: CAH2-WzkLXOdzGsJU_kw1+nZo2ByNJrhZsuYSowzvd_2R=yc-uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 7, 2021 at 7:58 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that's a good observation. I think the current autovacuum
> algorithm works pretty well when things are normal. But in extreme
> scenarios it does not degrade gracefully.

+1 to all of the specific examples you go on to describe.

> > I also think that our problem is not so much that we don't have
> > accurate statistics about dead tuples (though we surely don't have
> > much accuracy). The main problem seems to be that there are various
> > specific, important ways in which the distribution of dead tuples may
> > matter (leading to various harmful biases). And so it seems reasonable
> > to fudge how we interpret dead tuples with the intention of capturing
> > some of that, as a medium term solution. Something that considers the
> > concentration of dead tuples in heap pages seems promising.
>
> I am less convinced about this part. It sort of sounds like you're
> saying - it doesn't really matter whether the numbers we gather are
> accurate, just that they produce the correct results.

That's not quite it. More like: I think that it would be reasonable to
define dead tuples abstractly, in a way that's more useful but
nevertheless cannot diverge too much from the current definition. We
should try to capture "directionality", with clear error bounds. This
won't represent the literal truth, but it will be true in spirit (or
much closer).

For example, why should we count dead heap-only tuples from earlier in
a HOT chain, even when we see no evidence that opportunistic HOT
pruning can't keep up on that page? Since we actually care about the
direction of things, not just the present state of things, we'd be
justified in completely ignoring those dead tuples. Similarly, it
might well make sense to give more weight to concentrations of LP_DEAD
items on a page -- that is a signal that things are not going well *at
the level of the page*. Not so much when you have a few LP_DEAD stubs,
but certainly when you have dozens of them on one page, or even
hundreds. And so ISTM that the conditions of the page should influence
how we interpret/count that page's dead tuples, in both directions
(interpret the page as having more dead tuples, or fewer).

We all know that there isn't a sensible answer to the question "if the
abstract cost units used in the optimizer say that one sequential page
access is 4x cheaper than one random page access, then what's the
difference between accessing 10 random pages sequentially in close
succession, versus accessing the same 10 pages randomly?". But at the
same time, we can say for sure that random is more expensive to *some*
degree, but certainly never by multiple orders of magnitude. The model
is imperfect, to be sure, but it is better than many alternative
models that are also technically possible. We need *some* cost model
for a cost-based optimizer, and it is better to be approximately
correct than exactly wrong. Again, the truly important thing is to not
be totally wrong about any one thing.

Another way of putting it is that I am willing to accept a more biased
definition of dead tuples, if that lowers the variance:

https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

We have some certainty about what is possible in a world in which we
use the visibility map directly, and so our final estimate should
never be wildly unreasonable -- the abstract idea of dead tuples can
still be anchored to the physical reality.

As with the optimizer and its cost units, we don't have that many
degrees of freedom when autovacuum.c launches a worker, any I don't
see that changing -- we must ultimately decide to launch or not launch
an autovacuum worker (for any given table) each time the autovacuum
launcher wakes up. So we're practically forced to come up with a model
that has some abstract idea of one kind of bloat/dead tuple. I would
say that we already have one, in fact -- it's just not a very good one
because it doesn't take account of obviously-relevant factors like
HOT. It could quite easily be less wrong.

> If the
> information we currently gather wouldn't produce the right results
> even if it were fully accurate, that to me suggests that we're
> gathering the wrong information, and we should gather something else.

I think that counting dead tuples is useful, but not quite sufficient
on its own. At the same time, we still need something that works like
a threshold -- because that's just how the autovacuum.c scheduling
works. It's a go/no-go decision, regardless of how the decision is
made.

> For example, we could attack the useless-vacuuming problem by having
> each vacuum figure out - and store - the oldest XID that could
> potentially be worth using as a cutoff for vacuuming that table, and
> refusing to autovacuum that table again until we'd be able to use a
> cutoff >= that value. I suppose this would be the oldest of (a) any
> XID that exists in the table on a tuple that we judged recently dead,
> (b) any XID that is currently-running, and (c) the next XID.

I agree that that's a good idea, but it seems like something that only
augments what I'm talking about. I suppose that it might become
necessary if we get something along the lines of what we've been
discussing, though.

> I also accept that knowing the concentration of dead tuples on pages
> that contain at least 1 dead tuple could be interesting. I've felt for
> a while that it's a mistake to know how many dead tuples there are but
> not how many pages contain them, because that affects both the amount
> of I/O required to vacuum and also how much need we have to set VM
> bits.

Right. And as I keep saying, the truly important thing is to not
*completely* ignore any relevant dimension of cost. I just don't want
to ever be wildly wrong -- not even once. We can tolerate being
somewhat less accurate all the time (not that we necessarily have to
make a trade-off), but we cannot tolerate pathological behavior. Of
course I include new/theoretical pathological behaviors here (not just
the ones we know about today).

> I'm not sure I would have approached gathering that information
> in the way that you're proposing here, but I'm not deeply against it,
> either. I do think that we should try to keep it as a principle that
> whatever we do gather, we should try our best to make accurate. If
> that doesn't work well, then gather different stuff instead.

It's important to be accurate, but it's also important to be tolerant
of model error, which is inevitable. We should make pragmatic
decisions about what kinds of errors our new model will have. And it
should have at least a rudimentary ability to learn from its mistakes.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-07 19:19:36 Re: pg_dump versus ancient server versions
Previous Message Mark Dilger 2021-12-07 18:59:02 Re: pg_dump versus ancient server versions