From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Remaining case where reltuples can become distorted across multiple VACUUM operations |
Date: | 2022-08-11 08:47:54 |
Message-ID: | CAEze2WhkEMCuBM=vvGaq+F4h=p-sYA26u3ZWyW8gs2t-MQq+Kw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 8 Aug 2022 at 18:48, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > Because if a subset of the pages of a relation contains more tuples
> > than your current total expected tuples in the table, you should
> > update your expectations regardless of which blocks or which number of
> > blocks you've scanned - the previous stored value is a strictly worse
> > estimation than your last measurement.
>
> The previous stored value could be -1, which represents the idea that
> we don't know the tuple density yet. So it doesn't necessarily follow
> that the new estimate is strictly better, even in this exact scenario.
>
> > A 33-block relation with first 32 1-tuple pages is still enough to
> > have a last page with 250 tuples, which would be ignored in that
> > scheme and have a total tuple count of 33 or so.
>
> The simple fact is that there is only so much we can do with the
> limited information/context that we have. Heuristics are not usually
> free of all bias. Often the bias is the whole point -- the goal can be
> to make sure that we have the bias that we know we can live with, and
> not the opposite bias, which is much worse. Details of which are
> usually very domain specific.
>
> I presented my patch with a very simple test case -- a very clear
> problem. Can you do the same for this scenario?
CREATE TABLE tst (id int primary key generated by default as identity,
payload text) with (fillfactor 50); -- fillfactor to make pages fill
up fast
INSERT INTO tst (payload) select repeat('a', 5000) from
generate_series(32); -- 32 pages filled with large tuples
INSERT INTO tst (payload) select repeat('a', 4); -- small tuple at last page
vacuum (verbose, freeze) tst; -- 33 tuples on 33 pages, with lots of
space left on last page
INSERT INTO tst(payload) select repeat('a', 4) from
generate_series(1,63); -- now, we have 64 tuples on the last page
vacuum verbose tst; -- with your patch it reports only 33 tuples
total, while the single page that was scanned contains 64 tuples, and
the table contains 96 tuples.
> I accept that it is possible that we'll keep an old reltuples which is
> provably less accurate than doing something with the latest
> information from vacuumlazy.c. But the conditions under which this can
> happen are *very* narrow. I am not inclined to do anything about it
> for that reason.
I think I understand your reasoning, but I don't agree with the
conclusion. The attached patch 0002 does fix that skew too, at what I
consider negligible cost. 0001 is your patch with a new version
number.
I'm fine with your patch as is, but would appreciate it if known
estimate mistakes would also be fixed.
An alternative solution could be doing double-vetting, where we ignore
tuples_scanned if <2% of pages AND <2% of previous estimated tuples
was scanned.
Kind regards,
Matthias van de Meent
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Avoid-reltuples-distortion-in-very-small-tables.patch | application/octet-stream | 3.1 KB |
v2-0002-Avoid-reltuples-distortion-in-very-small-tables.patch | application/octet-stream | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2022-08-11 10:02:29 | Re: Cleaning up historical portability baggage |
Previous Message | Dilip Kumar | 2022-08-11 08:15:19 | Re: making relfilenodes 56 bits |