From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2024-07-20 01:58:33 |
Message-ID: | CADkLM=c9gnYMkQCyh1+R56VrCkVWbu8-SCpmHepkbAJHMcVuuA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> * Doc build error and malformatting.
>
Looking into it.
> * I'm not certain that we want all changes to relation stats to be non-
> transactional. Are there transactional use cases? Should it be an
> option? Should it be transactional for pg_set_relation_stats() but non-
> transactional for pg_restore_relation_stats()?
>
It's non-transactional because that's how ANALYZE does it to avoid bloating
pg_class. We _could_ do it transactionally, but on restore we'd immediately
have a pg_class that was 50% bloat.
>
> * The documentation for the pg_set_attribute_stats() still refers to
> upgrade scenarios -- shouldn't that be in the
> pg_restore_attribute_stats() docs? I imagine the pg_set variant to be
> used for ad-hoc planner stuff rather than upgrades.
>
Noted.
>
> * For the "WARNING: stat names must be of type text" I think we need an
> ERROR instead. The calling convention of name/value pairs is broken and
> we can't safely continue.
>
They can't be errors, because any one error fails the whole pg_upgrade.
> * The huge list of "else if (strcmp(statname, mc_freqs_name) == 0) ..."
> seems wasteful and hard to read. I think we already discussed this,
> what was the reason we can't just use an array to map the arg name to
> an arg position type OID?
>
That was my overreaction to the dislike that the P_argname enum got in
previous reviews.
We'd need an array of struct like
argname (ex. "mc_vals")
argtypeoid (one of: int, text, real, rea[])
argtypename (name we want to call the argtypeoid (integer, text. real,
real[] about covers it).
argpos (position in the arg list of the corresponding pg_set_ function
>
> * How much error checking did we decide is appropriate? Do we need to
> check that range_length_hist is always specified with range_empty_frac,
> or should we just call that the planner's problem if one is specified
> and the other not? Similarly, range stats for a non-range type.
>
I suppose we can let that go, and leave incomplete stat pairs in there.
The big risk is that somebody packs the call with more than 5 statkinds,
which would overflow the struct.
> * I think most of the tests should be of pg_set_*_stats(). For
> pg_restore_, we just want to know that it's translating the name/value
> pairs reasonably well and throwing WARNINGs when appropriate. Then, for
> pg_dump tests, it should exercise pg_restore_*_stats() more completely.
>
I was afraid you'd suggest that, in which case I'd break up the patch into
the pg_sets and the pg_restores.
> * It might help to clarify which arguments are important (like
> n_distinct) vs not. I assume the difference is that it's a non-NULLable
> column in pg_statistic.
>
There are NOT NULL stats...now. They might not be in the future. Does that
change your opinion?
>
> * Some arguments, like the relid, just seem absolutely required, and
> it's weird to just emit a WARNING and return false in that case.
Again, we can't fail.Any one failure breaks pg_upgrade.
> * To clarify: a return of "true" means all settings were successfully
> applied, whereas "false" means that some were applied and some were
> unrecognized, correct? Or does it also mean that some recognized
> options may not have been applied?
>
True means "at least some stats were applied. False means "nothing was
modified".
> * pg_set_attribute_stats(): why initialize the output tuple nulls array
> to false? It seems like initializing it to true would be safer.
>
+1
>
> * please use a better name for "k" and add some error checking to make
> sure it doesn't overrun the available slots.
>
k was an inheritance from analzye.c, from whence the very first version was
cribbed. No objection to renaming.
> * the pg_statistic tuple is always completely replaced, but the way you
> can call pg_set_attribute_stats() doesn't imply that -- calling
> pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
> => ...) looks like it would just replace the most_common_vals+freqs and
> leave histogram_bounds as it was, but it actually clears
> histogram_bounds, right? Should we make that work or should we document
> better that it doesn't?
>
That would complicate things. How would we intentionally null-out one stat,
while leaving others unchanged? However, this points out that I didn't
re-instate the re-definition that applied the NULL defaults.
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Simpson | 2024-07-20 02:17:52 | Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues) |
Previous Message | David G. Johnston | 2024-07-20 01:00:18 | Re: documentation structure |