Re: Statistics Import and Export

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-11-19 22:40:20
Message-ID: Zz0T1BENIFDnXmwf@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 19, 2024 at 03:47:20PM -0500, Corey Huinker wrote:
> I don't have a strong opinion here, but I suspect that if I was creating
> vacuumdb from scratch, I'd have suggested a --missing-only flag that would
> only work for --analyze-only/--analyze-in-stages.  That way, folks can
> still regenerate statistics if they want, but we also have an answer for
> folks who use pg_upgrade and have extended statistics.
>
>
> (combining responses to Bruce's para about surprise calculus and Nathan here)
>
> I agree that a clean API is desirable and a goal. And as I stated before, a new
> flag (--analyze-missing-in-stages / --analyze-post-pgupgrade, etc) or a flag
> modifier ( --missing-only ) was my first choice.

Yes, after a clean API is designed, you can then consider surprise
calculus. This is an issue not only for this feature, but for all
Postgres changes we consider, which is why I think it is worth stating
this clearly. If I am thinking incorrectly, we can discuss that here too.

> But if we're going to go that route, we have a messaging problem. We need to
> reach our customers who plan to upgrade, and explain to them that the
> underlying assumption behind running vacuumdb has gone away for 99% of them,
> and that may be 100% in the next version, but for that 99% running vacuumdb in
> the old way now actively undoes one of the major improvements to pg_upgrade,
> but this one additional option keeps the benefits of the new pg_upgrade without
> the drawbacks.

How much are we supposed to consider users who do not read the major
release notes? I realize we might be unrealistic to expect that from
the majority of our users, but I also don't want to contort our API to
adjust for them.

> That, and once we have extended statistics importing on upgrade, then the need
> for vacuumdb post-upgrade goes away entirely. So we'll have to re-message the
> users with that news too.
>
> I'd be in favor of this, but I have to be honest, our messaging reach is not
> good, and takes years to sink in. Years in which the message will change at
> least one more time. And this outreach will likely confuse users who already
> weren't (and now shouldn't be) using vacuumdb. In light of that, the big risk
> was that an action that some users learned to do years ago was now actively
> undoing whatever gains they were supposed to get in their upgrade downtime, and
> that downtime is money to them, hence the surprise calculus.

That is a big purpose of the major release notes. We can even list this
as an incompatibility in the sense that the procedure has changed.

> One other possibilities we could consider:
>
> * create a pg_stats_health_check script that lists tables missing stats, with
> --fix/--fix-in-stages options, effectively replacing vacuumdb for those
> purposes, and then crank up the messaging about that change. The "new shiny"
> effect of a new utility that has "stats", "health", and "check" in the name may
> be the search/click-bait we need to get the word out effectively. That last
> sentence may sound facetious, but it isn't, it's just accepting how search
> engines and eyeballs currently function. With that in place, we can then change
> the vacuumdb documentation to be deter future use in post-upgrade situations.

We used to create a script until the functionality was added to
vacuumdb. Since 99% of users will not need to do anything after
pg_upgrade, it would make sense to output the script only for the 1% of
users who need it and tell users to run it, rather than giving
instructions that are a no-op for 99% of users.

> * move missing-stats rebuilds into pg_upgrade/pg_restore itself, and this would
> give us the simpler one-time message that users should stop using vacuumdb in
> upgrade situations.

Uh, that would make pg_upgrade take longer for some users, which might
be confusing.

> * Making a concerted push to get extended stats import into v18 despite the
> high-effort/low-reward nature of it, and then we can go with the simple
> messaging of "Remember vacuumdb, that thing you probably weren't running
> post-upgrade but should have been? Now you can stop using it!". I had extended
> stats imports working back when the function took JSON input, so it's do-able,
> but the difficulty lies in how to represent an array of incomplete pg_statistic
> rows in a serial fashion that is cross-version compatible.

I am not a big fan of that at this point. If we get it, we can adjust
our API at that time, but I don't want to plan on it.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-19 22:45:40 Re: Parametrization minimum password lenght
Previous Message Masahiko Sawada 2024-11-19 22:37:56 Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT