Changing the state of data checksums in a running cluster

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Changing the state of data checksums in a running cluster
Date: 2024-07-03 06:41:01
Message-ID: E07A611B-9CF3-4FDB-8CE8-A221E39040EC@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After some off-list discussion about the desirability of this feature, where
several hackers opined that it's something that we should have, I've decided to
rebase this patch and submit it one more time. There are several (long)
threads covering the history of this patch [0][1], related work stemming from
this [2] as well as earlier attempts and discussions [3][4]. Below I try to
respond to a summary of points raised in those threads.

The mechanics of the patch hasn't changed since the last posted version, it has
mainly been polished slightly. A high-level overview of the processing is:
It's using a launcher/worker model where the launcher will spawn a worker per
database which will traverse all pages and dirty them in order to calculate and
set the checksum on them. During this inprogress state all backends calculated
and write checksums but don't verify them on read. Once all pages have been
checksummed the state of the cluster will switch over to "on" synchronized
across all backends with a procsignalbarrier. At this point checksums are
verified and processing is equal to checksums having been enabled initdb. When
a user disables checksums the cluster enters a state where all backends still
write checksums until all backends have acknowledged that they have stopped
verifying checksums (again using a procsignalbarrier). At this point the
cluster switches to "off" and checksums are neither written nor verified. In
case the cluster is restarted, voluntarily or via a crash, processing will have
to be restarted (more on that further down).

The user facing controls for this are two SQL level functions, for enabling and
disabling. The existing data_checksums GUC remains but is expanded with more
possible states (with on/off retained).

Complaints against earlier versions
===================================
Seasoned hackers might remember that this patch has been on -hackers before.
There has been a lot of review, and AFAICT all specific comments have been
addressed. There are however a few larger more generic complaints:

* Restartability - the initial version of the patch did not support stateful
restarts, a shutdown performed (or crash) before checksums were enabled would
result in a need to start over from the beginning. This was deemed the safe
orchestration method. The lack of this feature was seen as serious drawback,
so it was added. Subsequent review instead found the patch to be too
complicated with a too large featureset. I thihk there is merit to both of
these arguments: being able to restart is a great feature; and being able to
reason about the correctness of a smaller patch is also great. As of this
submission I have removed the ability to restart to keep the scope of the patch
small (which is where the previous version was, which received no review after
the removal). The way I prefer to frame this is to first add scaffolding and
infrastructure (this patch) and leave refinements and add-on features
(restartability, but also others like parallel workers, optimizing rare cases,
etc) for follow-up patches.

* Complexity - it was brought up that this is a very complex patch for a niche
feature, and there is a lot of truth to that. It is inherently complex to
change a pg_control level state of a running cluster. There might be ways to
make the current patch less complex, while not sacrificing stability, and if so
that would be great. A lot of of the complexity came from being able to
restart processing, and that's not removed for this version, but it's clearly
not close to a one-line-diff even without it.

Other complaints were addressed, in part by the invention of procsignalbarriers
which makes this synchronization possible. In re-reading the threads I might
have missed something which is still left open, and if so I do apologize for
that.

Open TODO items:
================
* Immediate checkpoints - the code is currently using CHECKPOINT_IMMEDIATE in
order to be able to run the tests in a timely manner on it. This is overly
aggressive and dialling it back while still being able to run fast tests is a
TODO. Not sure what the best option is there.

* Monitoring - an insightful off-list reviewer asked how the current progress
of the operation is monitored. So far I've been using pg_stat_activity but I
don't disagree that it's not a very sharp tool for this. Maybe we need a
specific function or view or something? There clearly needs to be a way for a
user to query state and progress of a transition.

* Throttling - right now the patch uses the vacuum access strategy, with the
same cost options as vacuum, in order to implement throttling. This is in part
due to the patch starting out modelled around autovacuum as a worker, but it
may not be the right match for throttling checksums.

* Naming - the in-between states when data checksums are enabled or disabled
are called inprogress-on and inprogress-off. The reason for this is simply
that early on there were only three states: inprogress, on and off, and the
process of disabling wasn't labeled with a state. When this transition state
was added it seemed like a good idea to tack the end-goal onto the transition.
These state names make the code easily greppable but might not be the most
obvious choices for anything user facing. Is "Enabling" and "Disabling" better
terms to use (across the board or just user facing) or should we stick to the
current?

There are ways in which this processing can be optimized to achieve better
performance, but in order to keep goalposts in sight and patchsize down they
are left as future work.

--
Daniel Gustafsson

[0] https://www.postgresql.org/message-id/flat/CABUevExz9hUUOLnJVr2kpw9Cx%3Do4MCr1SVKwbupzuxP7ckNutA%40mail.gmail.com
[1] https://www.postgresql.org/message-id/flat/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh%2BziCSa5YLLU_BA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/20181030051643.elbxjww5jjgnjaxg%40alap3.anarazel.de
[3] https://www.postgresql.org/message-id/flat/FF393672-5608-46D6-9224-6620EC532693%40endpoint.com
[4] https://www.postgresql.org/message-id/flat/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com

Attachment Content-Type Size
v1-0001-Support-checksum-enable-disable-in-a-running-clus.patch application/octet-stream 121.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-07-03 07:00:15 Re: Conflict Detection and Resolution
Previous Message Amit Kapila 2024-07-03 06:20:40 Re: speed up a logical replica setup