Re: Remaining case where reltuples can become distorted across multiple VACUUM operations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
Date: 2022-08-08 16:48:18
Message-ID: CAH2-WznSniwVEQv7pT03SGytNj6EKw1-YOv4N6WZgYuWrwmZvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> Because if a subset of the pages of a relation contains more tuples
> than your current total expected tuples in the table, you should
> update your expectations regardless of which blocks or which number of
> blocks you've scanned - the previous stored value is a strictly worse
> estimation than your last measurement.

The previous stored value could be -1, which represents the idea that
we don't know the tuple density yet. So it doesn't necessarily follow
that the new estimate is strictly better, even in this exact scenario.

> A 33-block relation with first 32 1-tuple pages is still enough to
> have a last page with 250 tuples, which would be ignored in that
> scheme and have a total tuple count of 33 or so.

The simple fact is that there is only so much we can do with the
limited information/context that we have. Heuristics are not usually
free of all bias. Often the bias is the whole point -- the goal can be
to make sure that we have the bias that we know we can live with, and
not the opposite bias, which is much worse. Details of which are
usually very domain specific.

I presented my patch with a very simple test case -- a very clear
problem. Can you do the same for this scenario?

I accept that it is possible that we'll keep an old reltuples which is
provably less accurate than doing something with the latest
information from vacuumlazy.c. But the conditions under which this can
happen are *very* narrow. I am not inclined to do anything about it
for that reason.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2022-08-08 17:06:14 Re: Asking for feedback on Pgperffarm
Previous Message Justin Pryzby 2022-08-08 16:44:31 Re: 2022-08-11 release announcement draft