Re: Enable data checksums by default

From: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2024-08-13 14:41:44
Message-ID: CAKAnmmKsJJ6FkGrLLuZ7qi1gjA2NVuy5i1FN+QKk-pU1ksTJgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter(at)eisentraut(dot)org>
wrote:

> My understanding was that the reason for some hesitation about adopting
> data checksums was the performance impact. Not the checksumming itself,
> but the overhead from hint bit logging. The last time I looked into that,
> you could get performance impacts on the order of 5% tps. Maybe that's
> acceptable, and you of course can turn it off if you want the extra
> performance. But I think this should be discussed in this thread.
>

Fair enough. I think the performance impact is acceptable, as evidenced by
the large number of people that turn it on. And it is easy enough to turn
it off again, either via --no-data-checksums or pg_checksums --disable.
I've come across people who have regretted not throwing a -k into their
initial initdb, but have not yet come across someone who has the opposite
regret. When I did some measurements some time ago, I found numbers much
less than 5%, but of course it depends on a lot of factors.

About the claim that it's already the de-facto standard. Maybe that is
> approximately true for "serious" installations. But AFAICT, the popular
> packagings don't enable checksums by default, so there is likely a
> significant middle tier between "just trying it out" and serious
> production use that don't have it turned on.
>

I would push back on that "significant" a good bit. The number of Postgres
installations in the cloud is very likely to dwarf the total package
installations. Maybe not 10 years ago, but now? Maybe someone from Amazon
can share some numbers. Not that we have any way to compare against package
installs :) But anecdotally the number of people who mention RDS etc. on
the various fora has exploded.

> For those uses, this change would render pg_upgrade useless for upgrades
> from an old instance with default settings to a new instance with default
> settings. And then users would either need to re-initdb with checksums
> turned back off, or I suppose run pg_checksums on the old instance before
> upgrading? This is significant additional complication.
>

Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.

> And packagers who have built abstractions on top of pg_upgrade (such as
> Debian pg_upgradecluster) would also need to implement something to manage
> this somehow.
>

How does it deal with clusters with checksums enabled now?

> I'm thinking pg_upgrade could have a mode where it adds the checksum
> during the upgrade as it copies the files (essentially a subset
> of pg_checksums). I think that would be useful for that middle tier of
> users who just want a good default experience.
>

Hm...might be a bad experience if it forces a switch out of --link mode.
Perhaps a warning at the end of pg_upgrade that suggests running
pg_checksums on your new cluster if you want to enable checksums?

Cheers,
Greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-08-13 14:54:34 Re: Normalize queries starting with SET for pg_stat_statements
Previous Message Alexander Korotkov 2024-08-13 14:38:55 Re: Create syscaches for pg_extension