From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Christoph Berg <myon(at)debian(dot)org> |
Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Update guidance for running vacuumdb after pg_upgrade. |
Date: | 2025-04-22 20:54:30 |
Message-ID: | aAgCBgIh3BxkvceM@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote:
> Re: Nathan Bossart
>> Update guidance for running vacuumdb after pg_upgrade.
>>
>> Now that pg_upgrade can carry over most optimizer statistics, we
>> should recommend using vacuumdb's new --missing-stats-only option
>> to only analyze relations that are missing statistics.
>
> I've been looking at vacuumdb --missing-stats-only because Debian's
> pg_upgradecluster is using that now.
>
> I am wondering if this is really good advice in the pg_upgrade
> documentation. Sure it's nice that optimizer statistics are carried
> over by pg_upgrade, but the pg_stat_user_tables statistics are not
> carried over, and afaict these are the numbers that determine when the
> next autovacuum or autoanalyze run is going to happen. By removing the
> "please run vacuumdb on all tables" step from the pg_upgrade docs, we
> are effectively telling everyone that they should be starting with
> these numbers all 0, postponing the next run to some indeterminate
> point. Running `vacuumdb --missing-stats-only` does not fix that
> because it's skipping the tables. Is that the message we want to send?
>
> (If I am misinterpreting the situation the docs should still explain
> why this is ok.)
relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and
mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set
dead_tuples to a nonzero value, so the worst-case scenario is that it would
take longer before a vacuum is triggered based on
autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this,
I think we'd need to recommend using "vacuumdb --all --analyze-only"
instead. We could alternatively suggest first running "vacuumdb --all
--analyze-in-stages --missing-stats-only" (to fill in any missing stats)
followed by "vacuumdb --all --analyze-only" (to update dead_tuples).
However, I'm not sure how concerned to be about this. It does seem bad
that it might take longer for tables to be vacuumed for the first time
after upgrade, but I believe that's already the case for any type of
unclean shutdown (e.g., immediate shutdown, server crash, starting from a
base backup, point-in-time recovery). I see that we do recommend running
ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2025-04-22 21:03:29 | Re: pgsql: Update guidance for running vacuumdb after pg_upgrade. |
Previous Message | Christoph Berg | 2025-04-22 19:43:56 | Re: pgsql: Update guidance for running vacuumdb after pg_upgrade. |
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2025-04-22 21:03:29 | Re: pgsql: Update guidance for running vacuumdb after pg_upgrade. |
Previous Message | Andrew Dunstan | 2025-04-22 20:27:23 | Re: Cygwin support |