Re: Enable data checksums by default

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2024-08-08 10:11:38
Message-ID: 8f5b725d-1a6c-4ba6-a9ba-a67106fa2054@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.08.24 00:46, Greg Sabino Mullane wrote:
> Currently, initdb only enables data checksums if passed the
> --data-checksums or -k argument. There was some hesitation years ago
> when this feature was first added, leading to the current situation
> where the default is off. However, many years later, there is wide
> consensus that this is an extraordinarily safe, desirable setting.
> Indeed, most (if not all) of the major commercial and open source
> Postgres systems currently turn this on by default. I posit you would be
> hard-pressed to find many systems these days in which it has NOT been
> turned on. So basically we have a de-facto standard, and I think it's
> time we flipped the switch to make it on by default.

I'm sympathetic to this proposal, but I want to raise some concerns.

My understanding was that the reason for some hesitation about adopting
data checksums was the performance impact. Not the checksumming itself,
but the overhead from hint bit logging. The last time I looked into
that, you could get performance impacts on the order of 5% tps. Maybe
that's acceptable, and you of course can turn it off if you want the
extra performance. But I think this should be discussed in this thread.

About the claim that it's already the de-facto standard. Maybe that is
approximately true for "serious" installations. But AFAICT, the popular
packagings don't enable checksums by default, so there is likely a
significant middle tier between "just trying it out" and serious
production use that don't have it turned on.

For those uses, this change would render pg_upgrade useless for upgrades
from an old instance with default settings to a new instance with
default settings. And then users would either need to re-initdb with
checksums turned back off, or I suppose run pg_checksums on the old
instance before upgrading? This is significant additional complication.
And packagers who have built abstractions on top of pg_upgrade (such
as Debian pg_upgradecluster) would also need to implement something to
manage this somehow.

So I think we need to think through the upgrade experience a bit more.
Unfortunately, pg_checksums hasn't gotten to the point that we were
perhaps once hoping for that you could enable checksums on a live
system. I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-08-08 10:47:42 Re: Refactoring postmaster's code to cleanup after child exit
Previous Message Zhijie Hou (Fujitsu) 2024-08-08 10:11:11 RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber