Re: Changing the state of data checksums in a running cluster

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Changing the state of data checksums in a running cluster
Date: 2024-10-08 20:38:36
Message-ID: DD25705F-E75F-4DCA-B49A-5578F4F55D94@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 7 Oct 2024, at 16:46, Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> I did a quick review today. First a couple minor comments:

Thanks for looking! 1-6 are all fixed.

> 7) controldata.c - maybe this
>
> if (oldctrl->data_checksum_version == 2)
>
> should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic
> constant? For "off" we use "0" which seems somewhat acceptable, but for
> other values it's less obvious what the meaning is.

It doesn't seem clean to include storage/bufpage.h in pg_upgrade, I wonder if
we should move (or mirror) the checksum versions to storage/checksum_impl.h to
make them available to frontend and backend tools?

> 8) xlog_internal.h - xl_checksum_state should be added to typedefs

Fixed.

> 9) system_functions.sql - Isn't it weird that this only creates the new
> pg_enable_data_checksums function, but not pg_disable_data_checksums?

We don't need any DEFAULT values for pg_disable_data_checksums so it doesn't
need to be created there.

> It
> also means it doesn't revoke EXECUTE from public on it, which I guess it
> probably should? Or why should this be different for the two functions?

That should however be done, so fixed.

> But there also seems to be a more serious problem with recovery. I did a
> simple script that does a loop of
>
> * start a cluster
> * initialize a small pgbench database (scale 1 - 100)
> * run "long" pgbench
> * call pg_enable_data_checksums(), wait for it to complete
> * stop the cluster with "-m immediate"
> * start the cluster
>
> And this unfortunately hits this assert:
>
> bool
> AbsorbChecksumsOnBarrier(void)
> {
> Assert(LocalDataChecksumVersion ==
> PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
> LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
> return true;
> }
>
> Based on our short discussion about this, the controlfile gets updated
> right after pg_enable_data_checksums() completes. The immediate stop
> however forces a recovery since the last checkpoint, which means we see
> the XLOG_CHECKSUMS WAL message again, and set the barrier. And then we
> exit recovery, try to start checkpointer and it trips over this, because
> the control file already has the "on" value :-(
>
> I'm not sure what's the best way to fix this. Would it be possible to
> remember we saw the XLOG_CHECKSUMS during recovery, and make the assert
> noop in that case? Or not set the barrier when exiting recovery. I'm not
> sure the relaxed assert would remain meaningful, though. What would it
> check on standbys, for example?
>
> Maybe a better way would be to wait for a checkpoint before updating the
> controlfile, similar to what we do at the beginning? Possibly even with
> the same "fast=true/false" logic. That would prevent us from seeing the
> XLOG_CHECKSUMS wal record with the updated flag. It would extend the
> "window" where a crash would mean we have to redo the checksums, but I
> don't think that matters much. For small databases who cares, and for
> large databases it should not be a meaningful difference (setting the
> checksums already ran over multiple checkpoints, so one checkpoint is
> not a big difference).

The more I think about it the more I think that updating the control file is
the wrong thing to do for this patch, it should only change the state in memory
and let the checkpoints update the controlfile. The attached fixes that and I
can no longer reproduce the assertion failure you hit.

The attached version also contains updates to the documentation, the aux proc
counter and other smaller bits of polish.

I did remove parts of the progress reporting for now since it can't be used
from the dynamic backgroundworker it seems. I need to regroup and figure out a
better way there, but I wanted to address your above find sooner rather than
wait for that.

--
Daniel Gustafsson

Attachment Content-Type Size
v4-0001-Online-enabling-and-disabling-of-data-checksums.patch application/octet-stream 135.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-10-08 20:39:57 Re: Changing the state of data checksums in a running cluster
Previous Message Andrew Dunstan 2024-10-08 20:24:37 Re: pgindent fails with perl 5.40