Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-07-20 01:58:33
Message-ID: CADkLM=c9gnYMkQCyh1+R56VrCkVWbu8-SCpmHepkbAJHMcVuuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> * Doc build error and malformatting.
>

Looking into it.

> * I'm not certain that we want all changes to relation stats to be non-
> transactional. Are there transactional use cases? Should it be an
> option? Should it be transactional for pg_set_relation_stats() but non-
> transactional for pg_restore_relation_stats()?
>

It's non-transactional because that's how ANALYZE does it to avoid bloating
pg_class. We _could_ do it transactionally, but on restore we'd immediately
have a pg_class that was 50% bloat.

>
> * The documentation for the pg_set_attribute_stats() still refers to
> upgrade scenarios -- shouldn't that be in the
> pg_restore_attribute_stats() docs? I imagine the pg_set variant to be
> used for ad-hoc planner stuff rather than upgrades.
>

Noted.

>
> * For the "WARNING: stat names must be of type text" I think we need an
> ERROR instead. The calling convention of name/value pairs is broken and
> we can't safely continue.
>

They can't be errors, because any one error fails the whole pg_upgrade.

> * The huge list of "else if (strcmp(statname, mc_freqs_name) == 0) ..."
> seems wasteful and hard to read. I think we already discussed this,
> what was the reason we can't just use an array to map the arg name to
> an arg position type OID?
>

That was my overreaction to the dislike that the P_argname enum got in
previous reviews.

We'd need an array of struct like

argname (ex. "mc_vals")
argtypeoid (one of: int, text, real, rea[])
argtypename (name we want to call the argtypeoid (integer, text. real,
real[] about covers it).
argpos (position in the arg list of the corresponding pg_set_ function

>
> * How much error checking did we decide is appropriate? Do we need to
> check that range_length_hist is always specified with range_empty_frac,
> or should we just call that the planner's problem if one is specified
> and the other not? Similarly, range stats for a non-range type.
>

I suppose we can let that go, and leave incomplete stat pairs in there.

The big risk is that somebody packs the call with more than 5 statkinds,
which would overflow the struct.

> * I think most of the tests should be of pg_set_*_stats(). For
> pg_restore_, we just want to know that it's translating the name/value
> pairs reasonably well and throwing WARNINGs when appropriate. Then, for
> pg_dump tests, it should exercise pg_restore_*_stats() more completely.
>

I was afraid you'd suggest that, in which case I'd break up the patch into
the pg_sets and the pg_restores.

> * It might help to clarify which arguments are important (like
> n_distinct) vs not. I assume the difference is that it's a non-NULLable
> column in pg_statistic.
>

There are NOT NULL stats...now. They might not be in the future. Does that
change your opinion?

>
> * Some arguments, like the relid, just seem absolutely required, and
> it's weird to just emit a WARNING and return false in that case.

Again, we can't fail.Any one failure breaks pg_upgrade.

> * To clarify: a return of "true" means all settings were successfully
> applied, whereas "false" means that some were applied and some were
> unrecognized, correct? Or does it also mean that some recognized
> options may not have been applied?
>

True means "at least some stats were applied. False means "nothing was
modified".

> * pg_set_attribute_stats(): why initialize the output tuple nulls array
> to false? It seems like initializing it to true would be safer.
>

+1

>
> * please use a better name for "k" and add some error checking to make
> sure it doesn't overrun the available slots.
>

k was an inheritance from analzye.c, from whence the very first version was
cribbed. No objection to renaming.

> * the pg_statistic tuple is always completely replaced, but the way you
> can call pg_set_attribute_stats() doesn't imply that -- calling
> pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
> => ...) looks like it would just replace the most_common_vals+freqs and
> leave histogram_bounds as it was, but it actually clears
> histogram_bounds, right? Should we make that work or should we document
> better that it doesn't?
>

That would complicate things. How would we intentionally null-out one stat,
while leaving others unchanged? However, this points out that I didn't
re-instate the re-definition that applied the NULL defaults.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Simpson 2024-07-20 02:17:52 Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)
Previous Message David G. Johnston 2024-07-20 01:00:18 Re: documentation structure