Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-19 20:12:12
Message-ID: CA+Tgmobf11qe_Aiqzo3uK00v37+-2bkce+exi+Mw=v5pc=6N7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> pgstat_report_analyze() will totally override the
> tabentry->dead_tuples information that drives autovacuum.c, based on
> an estimate derived from a random sample -- which seems to me to be an
> approach that just doesn't have any sound theoretical basis.

In other words, ANALYZE sometimes (but not always) produces wrong answers.

On Wed, Jan 18, 2023 at 4:08 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> One complicating factor is that VACUUM sometimes computes an incrementally
> more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> computes something sane. I unintentionally encountered one when I was trying
> something while writing this email, reproducer attached.

In other words, VACUUM sometimes (but not always) produces wrong answers.

TL;DR: We're screwed.

I refuse to believe that any amount of math you can do on numbers that
can be arbitrarily inaccurate will result in an accurate answer
popping out the other end. Trying to update the reltuples estimate
incrementally based on an estimate derived from a non-random,
likely-to-be-skewed subset of the table is always going to produce
distortion that gets worse and worse the more times you do it. If
could say, well, the existing estimate of let's say 100 tuples per
page is based on the density being 200 tuples per page in the pages I
just scanned and 50 tuples per page in the rest of the table, then you
could calculate a new estimate that keeps the value of 50 tuples per
page for the remainder of the table intact and just replaces the
estimate for the part you just scanned. But we have no way of doing
that, so we just make some linear combination of the old estimate with
the new one. That overweights the repeatedly-sampled portions of the
table more and more, making the estimate wronger and wronger.

Now, that is already quite bad. But if we accept the premise that
neither VACUUM nor ANALYZE is guaranteed to ever produce a new
actually-reliable estimate, then not only will we go progressively
more wrong as time goes by, but we have no way of ever fixing
anything. If you get a series of unreliable data points followed by a
reliable data point, you can at least get back on track when the
reliable data shows up. But it sounds like you guys are saying that
there's no guarantee that will ever happen, which is a bit like
discovering that not only do you have a hole in your gas tank but
there is no guarantee that you will arrive at a gas station ever again
regardless of distance travelled.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-19 20:13:39 Re: almost-super-user problems that we haven't fixed yet
Previous Message Nathan Bossart 2023-01-19 19:46:01 Re: almost-super-user problems that we haven't fixed yet