Re: Changing the state of data checksums in a running cluster

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <mbanck(at)gmx(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Changing the state of data checksums in a running cluster
Date: 2024-10-07 14:46:25
Message-ID: 642b8eeb-06ef-4aed-97a8-5297d9f67142@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I did a quick review today. First a couple minor comments:

1) monitoring.sgml

typos: number of database -> number of databases
calcuated -> calculated

2) unnecessary newline in heapam.c (no other changes)

3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345,
shadowing earlier variable

4) unnecessary comment change in bufmgr.c (no other changes)

5) unnecessary include and newline in bulk_write.c (no other changes)

6) unnecessary newline in pgstat.c (no other changes)

7) controldata.c - maybe this

if (oldctrl->data_checksum_version == 2)

should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic
constant? For "off" we use "0" which seems somewhat acceptable, but for
other values it's less obvious what the meaning is.

8) xlog_internal.h - xl_checksum_state should be added to typedefs

9) system_functions.sql - Isn't it weird that this only creates the new
pg_enable_data_checksums function, but not pg_disable_data_checksums? It
also means it doesn't revoke EXECUTE from public on it, which I guess it
probably should? Or why should this be different for the two functions?

Also the error message seems to differ:

test=> select pg_enable_data_checksums();
ERROR: permission denied for function pg_enable_data_checksums
test=> select pg_disable_data_checksums();
ERROR: must be superuser

Probably harmless, but seems a bit strange.

But there also seems to be a more serious problem with recovery. I did a
simple script that does a loop of

* start a cluster
* initialize a small pgbench database (scale 1 - 100)
* run "long" pgbench
* call pg_enable_data_checksums(), wait for it to complete
* stop the cluster with "-m immediate"
* start the cluster

And this unfortunately hits this assert:

bool
AbsorbChecksumsOnBarrier(void)
{
Assert(LocalDataChecksumVersion ==
PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
return true;
}

Based on our short discussion about this, the controlfile gets updated
right after pg_enable_data_checksums() completes. The immediate stop
however forces a recovery since the last checkpoint, which means we see
the XLOG_CHECKSUMS WAL message again, and set the barrier. And then we
exit recovery, try to start checkpointer and it trips over this, because
the control file already has the "on" value :-(

I'm not sure what's the best way to fix this. Would it be possible to
remember we saw the XLOG_CHECKSUMS during recovery, and make the assert
noop in that case? Or not set the barrier when exiting recovery. I'm not
sure the relaxed assert would remain meaningful, though. What would it
check on standbys, for example?

Maybe a better way would be to wait for a checkpoint before updating the
controlfile, similar to what we do at the beginning? Possibly even with
the same "fast=true/false" logic. That would prevent us from seeing the
XLOG_CHECKSUMS wal record with the updated flag. It would extend the
"window" where a crash would mean we have to redo the checksums, but I
don't think that matters much. For small databases who cares, and for
large databases it should not be a meaningful difference (setting the
checksums already ran over multiple checkpoints, so one checkpoint is
not a big difference).

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aya Iwata (Fujitsu) 2024-10-07 14:53:19 [WIP]Vertical Clustered Index (columnar store extension) - take2
Previous Message Nathan Bossart 2024-10-07 14:35:06 Re: Function for listing pg_wal/summaries directory