Re: Statistics Import and Export

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-11-19 01:42:35
Message-ID: ZzvtC9NyukjgdX0e@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 18, 2024 at 08:29:10PM -0500, Corey Huinker wrote:
> On Mon, Nov 18, 2024 at 2:47 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> You seem to be optimizing for people using pg_upgrade, and for people
> upgrading to PG 18, without adequately considering people using vacuumdb
> in non-pg_upgrade situations, and people using PG 19+.  Let me explain.
>
> This was a concern as I was polling people.
>
> A person using vacuumdb in a non-upgrade situation is, to my limited
> imagination, one of three types:
>
> 1. A person who views vacuumdb as a worthwhile janitorial task for downtimes.
> 2. A person who wants stats on a lot of recently created tables.
> 3. A person who wants better stats on a lot of recently (re)populated tables.
>
> The first group would not be using --analyze-in-stages or --analyze-only,
> because the vacuuming is a big part of it. They will be unaffected.
>
> The second group will be pleasantly surprised to learn that they no longer need
> to specify a subset of tables, as any table missing stats will get picked up.
>
> The third group would be surprised that their operation completed so quickly,
> check the docs, add in --force-analyze to their script, and re-run.

We can't design an API around who is going to be surprised. We have to
look at what the options say, what people would expect it to do, and
what it does. The reason "surprise" doesn't work in the long run is
that while PG 18 users might be surprised, PG 20 users will be confused.

> First, I see little concern here for how people who use --analyze and
> --analyze-only independent of pg_upgrade will be affected by this.
> While I recommend people decrease vacuum and analyze threshold during
> non-peak periods:
>
>         https://momjian.us/main/blogs/pgblog/2017.html#January_3_2017
>
> some people might just regenerate all statistics during non-peak periods
> using these options.  You can perhaps argue that --analyze-in-stages
> would only be used by pg_upgrade so maybe that can be adjusted more
> easily.
>
> I, personally, would be fine if this only modified --analyze-in-stages, as it
> already carries the warning:

Right, but I think we would need to rename the option to clarify what it
does, e.g. --analyze-missing-in-stages. If they use
--analyze-in-stages, they will get an error, and will then need to
reference the docs to see the new option wording, or we can suggest the
new option in the error message.

> But others felt that --analyze-only should be in the mix as well.

Again, with those other people not saying so in this thread, I can't
really comment on it --- I can only tell you what I have seen and others
are going to have to explain why they want such dramatic changes.

> No one advocated for changing the behavior of options that involve actual
> vacuuming.
>  
>
> Second, the API for what --analyze and --analyze-only do will be very
> confusing for people running, e.g., PG 20, because the average user
> reading the option name will not guess it only adds missing statistics.
>
> I think you need to rethink your approach and just accept that a mention
> of the new preserving statistic behavior of pg_upgrade, and the new
> vacuumdb API required, will be sufficient.  In summary, I think you need
> a new --compute-missing-statistics-only that can be combined with
> --analyze, --analyze-only, and --analyze-in-stages to compute only
> missing statistics, and document it in the PG 18 release notes.
>
>
> A --missing-only/--analyze-missing-in-stages option was my first idea, and it's
> definitely cleaner, but as I stated in the rejected ideas section above, when I
> reached out to others at PgConf.EU there was pretty general consensus that few
> people would actually read our documentation, and the few that have in the past
> are unlikely to read it again to discover the new option, and those people
> would have a negative impact of using --analyze-in-stages, effectively
> punishing them for having once read the documentation (or a blog post) but not
> re-read it prior to upgrade.

Again, you can't justify such changes based on discussions that are not
posted publicly here.

> So, to add non-pg_upgrade users to the outcome tree in my email from
> 2024-11-04:
>
>
> 5. Users who use vacuumdb in a non-upgrade situation and do not use either
> --analyze-in-stages or --analyze-only will be completely unaffected.
> 6. Users who use vacuumdb in a non-upgrade situation with either
> --analyze-in-stages or --analyze-only set will find that the operation
> skips tables that already have stats, and will have to add --force-analyze
> to restore previous behavior.
>
>
> That's not a great surprise for group 6, but I have to believe that group is
> smaller than group 5, and it's definitely smaller than the group of users that
> need to upgrade.

Again, a clean API is the goal, not surprise calculus.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-19 01:47:53 Re: per backend I/O statistics
Previous Message Sutou Kouhei 2024-11-19 01:31:15 Re: Make COPY format extendable: Extract COPY TO format implementations