Re: vacuumdb changes for stats import/export

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: vacuumdb changes for stats import/export
Date: 2025-01-24 21:25:31
Message-ID: CAD21AoCqsXK5B2Z9szw6H4ZFPTrs0zJWp_Ga0HeHF-137VC3ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 24, 2025 at 7:44 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:
> > On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:
> >> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:
> >>> I suggest that we make a new thread about the vacuumdb changes and
> >>> focus this thread and patch series on the pg_dump changes (and minor
> >>> flag adjustments to pg_upgrade).
> >>>
> >>> Unless you think that the pg_dump changes should block on the vacuumdb
> >>> changes? In which case please let me know because the pg_dump changes
> >>> are otherwise close to commit.
> >>
> >> I think that is a good idea. I don't see vacuumdb blocking this.
> >
> > +1, I've been reviewing the vacuumdb portion and am planning to start a new
> > thread in the near future. IIUC the bulk of the vacuumdb changes are
> > relatively noncontroversial, we just haven't reached consensus on the user
> > interface.
>
> As promised, I'm starting a new thread for this. The original thread [0]
> has some preliminary discussion about the subject.
>
> As you may be aware, there is an ongoing effort to carry over statistics
> during pg_upgrade. Today, we encourage users to use vacuumdb to run
> ANALYZE on all relations after upgrading. There's even a special
> --analyze-in-stages option that fast-tracks an initial set of minimal
> statistics for this use-case. Once the statistics are carried over by
> pg_upgrade, there will be little need to do this, except for perhaps
> extended statistics if they aren't carried over. But there are patches in
> flight for that, too [1].
>
> This thread is dedicated to figuring out what, if anything, to do about
> vacuumdb. I see the following general categories of options:
>
> * Do nothing. Other than updating our recommended guidance for
> post-upgrade analyzing, we'd leave vacuumdb alone. While this is
> certainly a simple option, it has a couple of key drawbacks. For one,
> anyone who doesn't see the new vacuumdb guidance may continue to do
> unnecessary post-upgrade analyzes. Also, if we don't get the extended
> statistics piece completed for v18, users will have to manually construct
> ANALYZE commands for those to run post-upgrade.
>
> * Add a breaking change so that users are forced to fix any outdated
> post-upgrade scripts. This is what the attached patches do. In short,
> they add a required parameter to --analyze-in-stages that can be set to
> either "all" or "missing". The new "missing" mode generates ANALYZE
> commands for relations that are missing statistics, while the "all" mode
> does the same thing that --analyze-in-stages does today. While the
> "missing" mode might be useful outside of upgrade cases, we could also
> recommend it as a post-upgrade step if the extended statistics work
> doesn't get committed for v18.
>
> * Add a new option that will make it easy to ANALYZE any relations that are
> missing statistics, but don't make any breaking changes to existing
> post-upgrade scripts. This option isn't really strictly necessary if we
> get the extended statistics parts committed, but it could be a nice
> feature, anyway.
>
> I chose the second approach because it had the most support in the other
> thread, but I definitely wouldn't characterize it as a consensus. 0001
> simply refactors the main catalog query to its own function so that its
> results can be reused in later stages of --analyze-in-stages. This might
> require a bit more memory and make --analyze-in-stages less responsive to
> concurrent changes, but it wasn't all that responsive to begin with. 0002
> adds the new "missing" mode functionality. Note that it regenerates all
> statistics for a relation if any applicable statistics types are missing.
> It's not clear whether we can or should do any better than that. Corey and
> I put a lot of effort into the catalog query changes, and we think we've
> covered everything, but we would of course appreciate some review on that
> part.
>
> BTW as long as the basic "missing" mode idea seems reasonable, it's easy
> enough to adjust the user interface to whatever we want, and I'm happy to
> do so as needed.
>
> Finally, I think another open question is whether any of this should apply
> to --analyze and/or --analyze-only. We do recommend the latter as a
> post-upgrade step in our pg_upgrade documentation, and I could see the
> "missing" mode being useful on its own for these modes, too.
>
>Thoughts?

I've not closely reviewed the patches yet but I find that the second
approach is reasonable to me. IIUC if the extended statistics
import/export work gets committed for v18, we don't need a new option
for post-upgrade analyze. But as you mentioned, these new modes seem
useful for other use cases too. Given that it could be used outside of
post-upgrading, I think it would make more sense to apply the "all"
and "missing" modes to --analyze and --analyze-only options too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-01-24 21:37:39 Re: [PATCH] SVE popcount support
Previous Message Tom Lane 2025-01-24 21:07:40 Re: Convert sepgsql tests to TAP