vacuumdb changes for stats import/export

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: vacuumdb changes for stats import/export
Date: 2025-01-24 15:44:46
Message-ID: Z5O1bpcwDrMgyrYy@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:
> On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:
>> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:
>>> I suggest that we make a new thread about the vacuumdb changes and
>>> focus this thread and patch series on the pg_dump changes (and minor
>>> flag adjustments to pg_upgrade).
>>>
>>> Unless you think that the pg_dump changes should block on the vacuumdb
>>> changes? In which case please let me know because the pg_dump changes
>>> are otherwise close to commit.
>>
>> I think that is a good idea. I don't see vacuumdb blocking this.
>
> +1, I've been reviewing the vacuumdb portion and am planning to start a new
> thread in the near future. IIUC the bulk of the vacuumdb changes are
> relatively noncontroversial, we just haven't reached consensus on the user
> interface.

As promised, I'm starting a new thread for this. The original thread [0]
has some preliminary discussion about the subject.

As you may be aware, there is an ongoing effort to carry over statistics
during pg_upgrade. Today, we encourage users to use vacuumdb to run
ANALYZE on all relations after upgrading. There's even a special
--analyze-in-stages option that fast-tracks an initial set of minimal
statistics for this use-case. Once the statistics are carried over by
pg_upgrade, there will be little need to do this, except for perhaps
extended statistics if they aren't carried over. But there are patches in
flight for that, too [1].

This thread is dedicated to figuring out what, if anything, to do about
vacuumdb. I see the following general categories of options:

* Do nothing. Other than updating our recommended guidance for
post-upgrade analyzing, we'd leave vacuumdb alone. While this is
certainly a simple option, it has a couple of key drawbacks. For one,
anyone who doesn't see the new vacuumdb guidance may continue to do
unnecessary post-upgrade analyzes. Also, if we don't get the extended
statistics piece completed for v18, users will have to manually construct
ANALYZE commands for those to run post-upgrade.

* Add a breaking change so that users are forced to fix any outdated
post-upgrade scripts. This is what the attached patches do. In short,
they add a required parameter to --analyze-in-stages that can be set to
either "all" or "missing". The new "missing" mode generates ANALYZE
commands for relations that are missing statistics, while the "all" mode
does the same thing that --analyze-in-stages does today. While the
"missing" mode might be useful outside of upgrade cases, we could also
recommend it as a post-upgrade step if the extended statistics work
doesn't get committed for v18.

* Add a new option that will make it easy to ANALYZE any relations that are
missing statistics, but don't make any breaking changes to existing
post-upgrade scripts. This option isn't really strictly necessary if we
get the extended statistics parts committed, but it could be a nice
feature, anyway.

I chose the second approach because it had the most support in the other
thread, but I definitely wouldn't characterize it as a consensus. 0001
simply refactors the main catalog query to its own function so that its
results can be reused in later stages of --analyze-in-stages. This might
require a bit more memory and make --analyze-in-stages less responsive to
concurrent changes, but it wasn't all that responsive to begin with. 0002
adds the new "missing" mode functionality. Note that it regenerates all
statistics for a relation if any applicable statistics types are missing.
It's not clear whether we can or should do any better than that. Corey and
I put a lot of effort into the catalog query changes, and we think we've
covered everything, but we would of course appreciate some review on that
part.

BTW as long as the basic "missing" mode idea seems reasonable, it's easy
enough to adjust the user interface to whatever we want, and I'm happy to
do so as needed.

Finally, I think another open question is whether any of this should apply
to --analyze and/or --analyze-only. We do recommend the latter as a
post-upgrade step in our pg_upgrade documentation, and I could see the
"missing" mode being useful on its own for these modes, too.

Thoughts?

[0] https://postgr.es/m/CADkLM%3DfR7TwH0cLREQkf5_%3DKLcOYVxJw0Km0i5MpaWeuDwVo6g%40mail.gmail.com
[1] https://postgr.es/m/CADkLM%3Ddpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho%3DqN3kX0Zg%40mail.gmail.com

--
nathan

Attachment Content-Type Size
v1-0001-vacuumdb-Save-catalog-query-results-for-analyze-i.patch text/plain 13.0 KB
v1-0002-vacuumdb-Allow-analyzing-in-stages-only-relations.patch text/plain 14.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2025-01-24 15:46:30 Re: SQL Property Graph Queries (SQL/PGQ)
Previous Message Nazir Bilal Yavuz 2025-01-24 15:29:46 Re: Show WAL write and fsync stats in pg_stat_io