Quick Links

Re: Statistics Import and Export

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, alvherre(at)alvh(dot)no-ip(dot)org
Subject:	Re: Statistics Import and Export
Date:	2024-07-19 21:21:58
Message-ID:	e2f8ad0da0096d4efaf68a6a441cc1931fa5cd80.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2024-07-18 at 02:09 -0400, Corey Huinker wrote:
> v23:
>
> Split pg_set_relation_stats into two functions: pg_set_relation_stats
> with named parameters like it had around v19 and
> pg_restore_relations_stats with the variadic parameters it has had in
> more recent versions, which processes the variadic parameters and
> then makes a call to pg_set_relation_stats.
>
> Split pg_set_attribute_stats into two functions:
> pg_set_attribute_stats with named parameters like it had around v19
> and pg_restore_attribute_stats with the variadic parameters it has
> had in more recent versions, which processes the variadic parameters
> and then makes a call to pg_set_attribute_stats.
>
> The intention here is that the named parameters signatures are easier
> for ad-hoc use, while the variadic signatures are evergreen and thus
> ideal for pg_dump/pg_upgrade.

v23-0001:

* I like the split for the reason you mention. I'm not 100% sure that
we need both, but from the standpoint of reviewing, it makes things
easier. We can always remove one at the last minute if its found to be
unnecessary. I also like the names.

* Doc build error and malformatting.

* I'm not certain that we want all changes to relation stats to be non-
transactional. Are there transactional use cases? Should it be an
option? Should it be transactional for pg_set_relation_stats() but non-
transactional for pg_restore_relation_stats()?

* The documentation for the pg_set_attribute_stats() still refers to
upgrade scenarios -- shouldn't that be in the
pg_restore_attribute_stats() docs? I imagine the pg_set variant to be
used for ad-hoc planner stuff rather than upgrades.

* For the "WARNING: stat names must be of type text" I think we need an
ERROR instead. The calling convention of name/value pairs is broken and
we can't safely continue.

* The huge list of "else if (strcmp(statname, mc_freqs_name) == 0) ..."
seems wasteful and hard to read. I think we already discussed this,
what was the reason we can't just use an array to map the arg name to
an arg position type OID?

* How much error checking did we decide is appropriate? Do we need to
check that range_length_hist is always specified with range_empty_frac,
or should we just call that the planner's problem if one is specified
and the other not? Similarly, range stats for a non-range type.

* I think most of the tests should be of pg_set_*_stats(). For
pg_restore_, we just want to know that it's translating the name/value
pairs reasonably well and throwing WARNINGs when appropriate. Then, for
pg_dump tests, it should exercise pg_restore_*_stats() more completely.

* It might help to clarify which arguments are important (like
n_distinct) vs not. I assume the difference is that it's a non-NULLable
column in pg_statistic.

* Some arguments, like the relid, just seem absolutely required, and
it's weird to just emit a WARNING and return false in that case.

* To clarify: a return of "true" means all settings were successfully
applied, whereas "false" means that some were applied and some were
unrecognized, correct? Or does it also mean that some recognized
options may not have been applied?

* pg_set_attribute_stats(): why initialize the output tuple nulls array
to false? It seems like initializing it to true would be safer.

* please use a better name for "k" and add some error checking to make
sure it doesn't overrun the available slots.

* the pg_statistic tuple is always completely replaced, but the way you
can call pg_set_attribute_stats() doesn't imply that -- calling
pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
=> ...) looks like it would just replace the most_common_vals+freqs and
leave histogram_bounds as it was, but it actually clears
histogram_bounds, right? Should we make that work or should we document
better that it doesn't?

Regards,
Jeff Davis

In response to

Re: Statistics Import and Export at 2024-07-18 06:09:26 from Corey Huinker

Responses

Re: Statistics Import and Export at 2024-07-20 01:58:33 from Corey Huinker

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2024-07-19 21:27:52	Re: behavior of GROUP BY with VOLATILE expressions
Previous Message	Doug Reynolds	2024-07-19 21:21:50	Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)