Re: Enable data checksums by default

From: Michael Banck <mbanck(at)gmx(dot)net>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2024-08-07 08:43:41
Message-ID: 66b333be.050a0220.63bcc.3648@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Aug 06, 2024 at 06:46:52PM -0400, Greg Sabino Mullane wrote:
> Please find attached a patch to enable data checksums by default.
>
> Currently, initdb only enables data checksums if passed the
> --data-checksums or -k argument. There was some hesitation years ago when
> this feature was first added, leading to the current situation where the
> default is off. However, many years later, there is wide consensus that
> this is an extraordinarily safe, desirable setting. Indeed, most (if not
> all) of the major commercial and open source Postgres systems currently
> turn this on by default. I posit you would be hard-pressed to find many
> systems these days in which it has NOT been turned on. So basically we have
> a de-facto standard, and I think it's time we flipped the switch to make it
> on by default.

[...]

> Yes, I am aware of the previous discussions on this, but the world moves
> fast - wal compression is better than in the past, vacuum is better now,
> and data-checksums being on is such a complete default in the wild, it
> feels weird and a disservice that we are not running all our tests like
> that.

I agree.

Some review on the patch:

> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
> index bdd613e77f..511f489d34 100644
> --- a/doc/src/sgml/ref/initdb.sgml
> +++ b/doc/src/sgml/ref/initdb.sgml
> @@ -267,12 +267,14 @@ PostgreSQL documentation
> <para>
> Use checksums on data pages to help detect corruption by the
> I/O system that would otherwise be silent. Enabling checksums
> - may incur a noticeable performance penalty. If set, checksums
> + may incur a small performance penalty. If set, checksums
> are calculated for all objects, in all databases. All checksum

I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
covered elsewhere in the documentation (I just looked at the patch), but
if not, it probably should be added here as a word of caution.

> failures will be reported in the
> <link linkend="monitoring-pg-stat-database-view">
> <structname>pg_stat_database</structname></link> view.
> See <xref linkend="checksums" /> for details.
> + As of version 18, checksums are enabled by default. They can be
> + disabled by use of <option>--no-data-checksums</option>.

I think we usually do not mention when a feature was added/changed, do
we? So I'd just write "(default: enabled)" or whatever is the style of
the surrounding options.

> </para>
> </listitem>
> </varlistentry>
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index f00718a015..ce7d3e99e5 100644
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
> @@ -164,7 +164,7 @@ static bool noinstructions = false;
> static bool do_sync = true;
> static bool sync_only = false;
> static bool show_setting = false;
> -static bool data_checksums = false;
> +static bool data_checksums = true;
> static char *xlog_dir = NULL;
> static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
> static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
> @@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
> {"waldir", required_argument, NULL, 'X'},
> {"wal-segsize", required_argument, NULL, 12},
> {"data-checksums", no_argument, NULL, 'k'},
> + {"no-data-checksums", no_argument, NULL, 20},

Does it make sense to add -K (capital k) as a short-cut for this? I
think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
maybe that is not wider project policy.

Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-08-07 08:52:13 Re: pgsql: Introduce hash_search_with_hash_value() function
Previous Message shveta malik 2024-08-07 08:41:43 Re: Conflict detection and logging in logical replication