Re: Dubious server log messages after pg_upgrade

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Dubious server log messages after pg_upgrade
Date: 2025-03-15 02:09:28
Message-ID: Z9ThWFlR71loz5T_@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 12, 2025 at 08:41:29PM -0400, Tom Lane wrote:
> So this may have been going on for quite some time without our
> noticing. The "corrupted statistics file" whine is most likely
> caused by pg_upgrade copying the old system's pgstat.stat file
> into the new installation --- is that a good idea? I have
> no idea what's causing the redo LSN complaint, but it seems
> like that might deserve closer investigation.

Playing catch-up with various things, my apologies for the lag.

We do not copy the stats file from the old to the new node AFAIK.
This would not work anyway as the old file would fail to load on the
new node when starting the new server due to PGSTAT_FILE_FORMAT_ID
that does not match. Note that the LSN check is happening after the
version check.

The complaint is coming from the firect control file manipulations
that pg_upgrade does, which are incompatible with what the stats file
stores, like this one with pg_resetwal that forces the new cluster to
use a redo LSN newer than anything that has been copied from the old
cluster so as the page LSNs can remain consistent:
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
/* use timeline 1 to match controldata and no WAL history file */
"\"%s/pg_resetwal\" -l 00000001%s \"%s\"", new_cluster.bindir,
old_cluster.controldata.nextxlogfile + 8,
new_cluster.pgdata);

b860848232aa exists because I've been trying to make the handling of
the stats file more durable by forcing it to be flushed at each
checkpoint, where I've found this check to be independently useful.

Let's remove it for this release. Perhaps we will not even need this
part if we are able to rebuild the most critical stats from WAL after
a crash. This itself needs more work, one point mentioned being to
move some table stats at the level of its relfilenode(s) so as we
could catch up on the data in the startup process when recovering.
Note the bump of PGSTAT_FILE_FORMAT_ID that's required by removing
this LSN.

With all that said, please see the attached that I'm planning to do.
--
Michael

Attachment Content-Type Size
0001-Revert-Add-redo-LSN-to-pgstats-files.patch text/plain 5.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-03-15 02:42:50 Re: 64 bit numbers vs format strings
Previous Message Thomas Munro 2025-03-15 01:04:23 Re: Available disk space per tablespace