Re: Statistics Import and Export

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-10 15:57:22
Message-ID: CALj2ACW9TYf8Lyuy6Y4HnPB63b4G4FMOOW0BtvWjgQiYE-db4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>
> Anyway, here's v7. Eagerly awaiting feedback.

Thanks for working on this. It looks useful to have the ability to
restore the stats after upgrade and restore. But, the exported stats
are valid only until the next ANALYZE is run on the table. IIUC,
postgres collects stats during VACUUM, autovacuum and ANALYZE, right?
Perhaps there are other ways to collect stats. I'm thinking what
problems does the user face if they are just asked to run ANALYZE on
the tables (I'm assuming ANALYZE doesn't block concurrent access to
the tables) instead of automatically exporting stats.

Here are some comments on the v7 patch. I've not dived into the whole
thread, so some comments may be of type repeated or need
clarification. Please bear with me.

1. The following two are unnecessary changes in pg_proc.dat, please remove them.

--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8818,7 +8818,6 @@
{ oid => '3813', descr => 'generate XML text node',
proname => 'xmltext', proisstrict => 't', prorettype => 'xml',
proargtypes => 'text', prosrc => 'xmltext' },
-
{ oid => '2923', descr => 'map table contents to XML',
proname => 'table_to_xml', procost => '100', provolatile => 's',
proparallel => 'r', prorettype => 'xml',
@@ -12163,8 +12162,24 @@

# GiST stratnum implementations
{ oid => '8047', descr => 'GiST support',
- proname => 'gist_stratnum_identity', prorettype => 'int2',
+ proname => 'gist_stratnum_identity',prorettype => 'int2',
proargtypes => 'int2',
prosrc => 'gist_stratnum_identity' },

2.
+ they are replaced by the next auto-analyze. This function is used by
+ <command>pg_upgrade</command> and <command>pg_restore</command> to
+ convey the statistics from the old system version into the new one.
+ </para>

Is there any demonstration of pg_set_relation_stats and
pg_set_attribute_stats being used either in pg_upgrade or in
pg_restore? Perhaps, having them as 0002, 0003 and so on patches might
show real need for functions like this. It also clarifies how these
functions pull stats from tables on the old cluster to the tables on
the new cluster.

3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing
to pg_class and might affect the plans as stats can get tampered. Can
we REVOKE the execute permissions from the public out of the box in
src/backend/catalog/system_functions.sql? This way one can decide who
to give permissions to.

4.
+SELECT pg_set_relation_stats('stats_export_import.test'::regclass,
3.6::float4, 15000);
+ pg_set_relation_stats
+-----------------------
+ t
+(1 row)
+
+SELECT reltuples, relpages FROM pg_class WHERE oid =
'stats_export_import.test'::regclass;
+ reltuples | relpages
+-----------+----------
+ 3.6 | 15000

Isn't this test case showing a misuse of these functions? Table
actually has no rows, but we are lying to the postgres optimizer on
stats. I think altering stats of a table mustn't be that easy for the
end user. As mentioned in comment #3, permissions need to be
tightened. In addition, we can also mark the functions pg_upgrade only
with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore
(or I don't know if we have a way to know within the server that the
server is running for pg_restore).

5. In continuation to the comment #2, is pg_dump supposed to generate
pg_set_relation_stats and pg_set_attribute_stats statements for each
table? When pg_dump does that , pg_restore can automatically load the
stats.

6.
+/*-------------------------------------------------------------------------
* * statistics.c *
+ * IDENTIFICATION
+ * src/backend/statistics/statistics.c
+ *
+ *-------------------------------------------------------------------------

A description of what the new file statistics.c does is missing.

7. pgindent isn't happy with new file statistics.c, please check.

8.
+/*
+ * Import statistics for a given relation attribute
+ *
+ * pg_set_attribute_stats(relation Oid, attname name, stainherit bool,
+ * stanullfrac float4, stawidth int, stadistinct float4,

Having function definition in the function comment isn't necessary -
it's hard to keep it consistent with pg_proc.dat in future. If
required, one can either look at pg_proc.dat or docs.

9. Isn't it good to add a test case where the plan of a query on table
after exporting the stats would remain same as that of the original
table from which the stats are exported? IMO, this is a more realistic
than just comparing pg_statistic of the tables because this is what an
end-user wants eventually.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-10 16:08:24 Re: UUID v7
Previous Message Pavel Stehule 2024-03-10 15:51:19 Re: pg_dump include/exclude fails to error