Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
Date: 2025-04-22 20:54:30
Message-ID: aAgCBgIh3BxkvceM@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote:
> Re: Nathan Bossart
>> Update guidance for running vacuumdb after pg_upgrade.
>>
>> Now that pg_upgrade can carry over most optimizer statistics, we
>> should recommend using vacuumdb's new --missing-stats-only option
>> to only analyze relations that are missing statistics.
>
> I've been looking at vacuumdb --missing-stats-only because Debian's
> pg_upgradecluster is using that now.
>
> I am wondering if this is really good advice in the pg_upgrade
> documentation. Sure it's nice that optimizer statistics are carried
> over by pg_upgrade, but the pg_stat_user_tables statistics are not
> carried over, and afaict these are the numbers that determine when the
> next autovacuum or autoanalyze run is going to happen. By removing the
> "please run vacuumdb on all tables" step from the pg_upgrade docs, we
> are effectively telling everyone that they should be starting with
> these numbers all 0, postponing the next run to some indeterminate
> point. Running `vacuumdb --missing-stats-only` does not fix that
> because it's skipping the tables. Is that the message we want to send?
>
> (If I am misinterpreting the situation the docs should still explain
> why this is ok.)

relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and
mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set
dead_tuples to a nonzero value, so the worst-case scenario is that it would
take longer before a vacuum is triggered based on
autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this,
I think we'd need to recommend using "vacuumdb --all --analyze-only"
instead. We could alternatively suggest first running "vacuumdb --all
--analyze-in-stages --missing-stats-only" (to fill in any missing stats)
followed by "vacuumdb --all --analyze-only" (to update dead_tuples).

However, I'm not sure how concerned to be about this. It does seem bad
that it might take longer for tables to be vacuumed for the first time
after upgrade, but I believe that's already the case for any type of
unclean shutdown (e.g., immediate shutdown, server crash, starting from a
base backup, point-in-time recovery). I see that we do recommend running
ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.

--
nathan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Christoph Berg 2025-04-22 21:03:29 Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
Previous Message Christoph Berg 2025-04-22 19:43:56 Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2025-04-22 21:03:29 Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
Previous Message Andrew Dunstan 2025-04-22 20:27:23 Re: Cygwin support