Re: pg_upgrade: Support for upgrading to checksums enabled

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Robert Treat <rob(at)xzilla(dot)net>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: Support for upgrading to checksums enabled
Date: 2025-03-11 10:42:20
Message-ID: 0ae64f7c-ab36-431c-9c0d-ec398c12da87@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.02.25 00:41, Robert Treat wrote:
> On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>>
>> On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
>>> The purpose of this patch is to allow using pg_upgrade between clusters that
>>> have different checksum settings. When upgrading between instances with
>>> different checksum settings, the --copy (default) mode automatically sets
>>> (or unsets) the checksum on the fly.
>>>
>>> This would be particularly useful if we switched to enabling checksums by
>>> default, as [0] proposes, but it's also useful without that.
>>
>> Given enabling checksums can be rather expensive, I think it makes sense to
>> add a way to do it during pg_upgrade versus asking folks to run
>> pg_checksums separately. I'd anticipate arguments against enabling
>> checksums automatically, but as you noted, we can move it to a separate
>> option (e.g., --copy --enable-checksums). Disabling checksums with
>> pg_checksums is fast because it just updates pg_control, so I don't see any
>> need for --disable-checkums in pg_upgrade.
>>
>
> The repercussions of either enabling or disabling checksums on
> accident is quite high (not for pg_upgrade, but for $futureDBA), so
> ISTM an explicit flag for BOTH is the right way to go. In that
> scenario, pg_upgrade would check to make sure the clusters match and
> then make the appropriate suggestion. In the case someone did
> something like --enable-checksums and --link, again, we'd toss an
> error that --copy mode is required to --enable-checksums.

Here is an updated patch that works more along those lines. It adds a
pg_upgrade option --update-checksums, which activates the code to
rewrite the checksums. You must specify this option if the source and
target clusters have different checksum settings.

Note that this also works to hypothetically upgrade between future
different checksum versions (hence "--update-*", not "--enable-*").
Also, as the patch is currently written, it is also required to specify
this option to downgrade from checksums to no-checksums. (It will then
write a zero into the checksum place, as it would look if you had never
used checksums.) Also, you can optionally specify this option even if
the checksum settings are the same, then it will recalculate the
checksums. Probably not all of this is useful, but perhaps a subset of
it. Thoughts?

Also, I still don't know what to do about the Windows code path in
copyFile(). We could just not support this feature on Windows? Or
maybe the notionally correct thing to do would be to move that code into
copyFileByRange(). But then we'd need a different default on Windows
and it would require more documentation. I don't know what to do here
and I don't have enough context to make a suggestion. But if we don't
answer this, I don't think we can move ahead with this feature.

Attachment Content-Type Size
v2-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patch text/plain 6.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-03-11 10:44:45 Re: Test to dump and restore objects left behind by regression
Previous Message Ashutosh Bapat 2025-03-11 10:38:53 Statistics import and export: difference in statistics of materialized view dumped