Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-11-19 20:47:20
Message-ID: CADkLM=cwkxTFOawsi0FwVJH2AuRxLhJz8XF3oxipVwmy3eEhwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> I don't have a strong opinion here, but I suspect that if I was creating
> vacuumdb from scratch, I'd have suggested a --missing-only flag that would
> only work for --analyze-only/--analyze-in-stages. That way, folks can
> still regenerate statistics if they want, but we also have an answer for
> folks who use pg_upgrade and have extended statistics.

(combining responses to Bruce's para about surprise calculus and Nathan
here)

I agree that a clean API is desirable and a goal. And as I stated before, a
new flag (--analyze-missing-in-stages / --analyze-post-pgupgrade, etc) or a
flag modifier ( --missing-only ) was my first choice.

But if we're going to go that route, we have a messaging problem. We need
to reach our customers who plan to upgrade, and explain to them that the
underlying assumption behind running vacuumdb has gone away for 99% of
them, and that may be 100% in the next version, but for that 99% running
vacuumdb in the old way now actively undoes one of the major improvements
to pg_upgrade, but this one additional option keeps the benefits of the new
pg_upgrade without the drawbacks.

That, and once we have extended statistics importing on upgrade, then the
need for vacuumdb post-upgrade goes away entirely. So we'll have to
re-message the users with that news too.

I'd be in favor of this, but I have to be honest, our messaging reach is
not good, and takes years to sink in. Years in which the message will
change at least one more time. And this outreach will likely confuse users
who already weren't (and now shouldn't be) using vacuumdb. In light of
that, the big risk was that an action that some users learned to do years
ago was now actively undoing whatever gains they were supposed to get in
their upgrade downtime, and that downtime is money to them, hence the
surprise calculus.

One other possibilities we could consider:

* create a pg_stats_health_check script that lists tables missing stats,
with --fix/--fix-in-stages options, effectively replacing vacuumdb for
those purposes, and then crank up the messaging about that change. The "new
shiny" effect of a new utility that has "stats", "health", and "check" in
the name may be the search/click-bait we need to get the word out
effectively. That last sentence may sound facetious, but it isn't, it's
just accepting how search engines and eyeballs currently function. With
that in place, we can then change the vacuumdb documentation to be deter
future use in post-upgrade situations.

* move missing-stats rebuilds into pg_upgrade/pg_restore itself, and this
would give us the simpler one-time message that users should stop using
vacuumdb in upgrade situations.

* Making a concerted push to get extended stats import into v18 despite the
high-effort/low-reward nature of it, and then we can go with the simple
messaging of "Remember vacuumdb, that thing you probably weren't running
post-upgrade but should have been? Now you can stop using it!". I had
extended stats imports working back when the function took JSON input, so
it's do-able, but the difficulty lies in how to represent an array of
incomplete pg_statistic rows in a serial fashion that is cross-version
compatible.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michel Pelletier 2024-11-19 20:52:46 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Kirill Reshke 2024-11-19 20:02:49 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row