Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: "Shinoda, Noriyoshi (SXD Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Statistics Import and Export
Date: 2024-10-22 11:09:45
Message-ID: CADkLM=frNuS6xP-PPgAis-62pzL6cF6jve9-H967003voSd6DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> If the relpages option contains -1 only for partitioned tables, shouldn't
> pg_set_relation_stats restrict the values that can be
>
> specified by table type? The attached patch limits the value to -1 or more
> if the target
>
> is a partition table, and 0 or more otherwise.
>
> Changing relpages to -1 on a non-partitioned table seems to significantly
> change the execution plan.
>

Short answer: It's working as intended. Significantly changing the
execution plan in weird ways is part of the intention of the function, even
if the execution plan changes for the worse. I appreciate

Longer answer:

Enforcing -1 on only partitioned tables is tricky, as it seems to be a
value for any table that has no local storage. So foreign data wrapper
tables could, in theory, also have this value. More importantly, the -1
value seems to be situational, in my experience it only happens on
partitioned tables after they have their first partition added, which means
that the current valid stat range is set according to facts that can
change. Like so :

chuinker=# select version();
version

------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.4 (Postgres.app) on aarch64-apple-darwin21.6.0, compiled by
Apple clang version 14.0.0 (clang-1400.0.29.102), 64-bit
(1 row)
chuinker=# create table part_parent (x integer) partition by range (x);
CREATE TABLE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
0
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
0
(1 row)

chuinker=# create table part_child partition of part_parent for values from
(0) TO (100);
CREATE TABLE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
0
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
-1
(1 row)

chuinker=# drop table part_child;
DROP TABLE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
-1
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid =
'part_parent'::regclass;
relpages
----------
-1
(1 row)

Prior versions (March 2024 and earlier) of this patch and the
pg_set_attribute_stats patch did have many checks to prevent importing stat
values that were "wrong" in some way. Some examples from attribute stats
import were:

* Histograms that were not monotonically nondecreasing.
* Frequency values that were out of bounds specified by other values in the
array.
* Frequency values outside of the [0.0,1.0] or [-1.0,1.0] depending on the
stat type.
* paired arrays of most-common-values and their attendant frequency array
not having the same length

All of these checks were removed based on feedback from reviewers and
committers who saw the pg_set_*_stats() functions as a fuzzing tool, so the
ability to set illogical, wildly implausible, or mathematically impossible
values was a feature, not a bug. I would suspect that they would view your
demonstration that setting impossible values on a table as proof that the
function can be used to experiment with planner scenarios. So, while I
previously would have eagerly accepted this patch as another valued
validation check, such checks don't fit with the new intention of the
functions. Still, I greatly appreciate your helping us discover ways in
which we can use this tool to make the planner do odd things.

One thing that could cause us to enforce a check like the one you submitted
would be if an invalid value caused a query to fail or a session to crash,
even then, that would probably spur a change to make the planner more
defensive rather than more checks on the set_* side.

>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2024-10-22 11:25:01 Re: Do not lock temp relations
Previous Message Hayato Kuroda (Fujitsu) 2024-10-22 10:54:15 RE: Make default subscription streaming option as Parallel