Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-10 19:52:51
Message-ID: CADkLM=cercsx5t7SoznVBgnqFxO1U5ZE4pb-cJ6mMRRutHqg=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 10, 2024 at 11:57 AM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
> >
> > Anyway, here's v7. Eagerly awaiting feedback.
>
> Thanks for working on this. It looks useful to have the ability to
> restore the stats after upgrade and restore. But, the exported stats
> are valid only until the next ANALYZE is run on the table. IIUC,
> postgres collects stats during VACUUM, autovacuum and ANALYZE, right?
> Perhaps there are other ways to collect stats. I'm thinking what
> problems does the user face if they are just asked to run ANALYZE on
> the tables (I'm assuming ANALYZE doesn't block concurrent access to
> the tables) instead of automatically exporting stats.
>

Correct. These are just as temporary as any other analyze of the table.
Another analyze will happen later, probably through autovacuum and wipe out
these values. This is designed to QUICKLY get stats into a table to enable
the database to be operational sooner. This is especially important after
an upgrade/restore, when all stats were wiped out. Other uses could be
adapting this for use the postgres_fdw so that we don't have to do table
sampling on the remote table, and of course statistics injection to test
the query planner.

> 2.
> + they are replaced by the next auto-analyze. This function is used
> by
> + <command>pg_upgrade</command> and <command>pg_restore</command> to
> + convey the statistics from the old system version into the new
> one.
> + </para>
>
> Is there any demonstration of pg_set_relation_stats and
> pg_set_attribute_stats being used either in pg_upgrade or in
> pg_restore? Perhaps, having them as 0002, 0003 and so on patches might
> show real need for functions like this. It also clarifies how these
> functions pull stats from tables on the old cluster to the tables on
> the new cluster.
>

That code was adapted from do_analyze(), and yes, there is a patch for
pg_dump, but as I noted earlier it is on hold pending feedback.

>
> 3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing
> to pg_class and might affect the plans as stats can get tampered. Can
> we REVOKE the execute permissions from the public out of the box in
> src/backend/catalog/system_functions.sql? This way one can decide who
> to give permissions to.
>

You'd have to be the table owner to alter the stats. I can envision these
functions getting a special role, but they could also be fine as
superuser-only.

>
> 4.
> +SELECT pg_set_relation_stats('stats_export_import.test'::regclass,
> 3.6::float4, 15000);
> + pg_set_relation_stats
> +-----------------------
> + t
> +(1 row)
> +
> +SELECT reltuples, relpages FROM pg_class WHERE oid =
> 'stats_export_import.test'::regclass;
> + reltuples | relpages
> +-----------+----------
> + 3.6 | 15000
>
> Isn't this test case showing a misuse of these functions? Table
> actually has no rows, but we are lying to the postgres optimizer on
> stats.

Consider this case. You want to know at what point the query planner will
start using a given index. You can generate dummy data for a thousand, a
million, a billion rows, and wait for that to complete, or you can just
tell the table "I say you have a billion rows, twenty million pages, etc"
and see when it changes.

But again, in most cases, you're setting the values to the same values the
table had on the old database just before the restore/upgrade.

> I think altering stats of a table mustn't be that easy for the
> end user.

Only easy for the end users that happen to be the table owner or a
superuser.

> As mentioned in comment #3, permissions need to be
> tightened. In addition, we can also mark the functions pg_upgrade only
> with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore
> (or I don't know if we have a way to know within the server that the
> server is running for pg_restore).
>

I think they will have usage both in postgres_fdw and for tuning.

>
> 5. In continuation to the comment #2, is pg_dump supposed to generate
> pg_set_relation_stats and pg_set_attribute_stats statements for each
> table? When pg_dump does that , pg_restore can automatically load the
> stats.
>

Current plan is to have one TOC entry in the post-data section with a
dependency on the table/index/matview. That let's us leverage existing
filters. The TOC entry will have a series of statements in it, one
pg_set_relation_stats() and one pg_set_attribute_stats() per attribute.

> 9. Isn't it good to add a test case where the plan of a query on table
> after exporting the stats would remain same as that of the original
> table from which the stats are exported? IMO, this is a more realistic
> than just comparing pg_statistic of the tables because this is what an
> end-user wants eventually.
>

I'm sure we can add something like that, but query plan formats change a
lot and are greatly dependent on database configuration, so maintaining
such a test would be a lot of work.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-03-10 20:30:54 Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Previous Message Magnus Hagander 2024-03-10 18:20:35 Re: pg_dump include/exclude fails to error