Quick Links

Re: Changing the state of data checksums in a running cluster

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Changing the state of data checksums in a running cluster
Date:	2025-03-09 17:57:52
Message-ID:	e812165e-a4a5-4f17-bf43-9c6173de153c@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Here's a rebased version of the patch series, addressing the issues I've
pointed out in the last round of reviews. I've kept the changes in
separate patches for clarity, but it should be squashed into a single
patch in the end.

1) v20250309-0001-Online-enabling-and-disabling-of-data-chec.patch
------------------------------------------------------------------

Original patch, rebased, resolving merge conflicts.

2) v20250309-0002-simple-post-rebase-fixes.patch
------------------------------------------------

A couple minor fixes, addressing test failures due to stuff committed
since the previous patch version. Mostly mechanical, the main change is
I don't think the pgstat_bestart() call is necessary. Or is it?

3) v20250309-0003-sync-the-data_checksums-GUC-with-the-local.patch
------------------------------------------------------------------

This is the main change, fixing failures in 002_actions.pl - the short
version is that test does "-C data_checksums", but AFAICS that does not
quite work because it does not call show_data_checksums() that early,
and instead just grabs the variable backing the GUC. Which may be out of
sync, so this patch fixes that by updating them both.

That fixes the issue, but it's it a bit strange we now have three places
tracking the state of data checksums? We have data_checksum_version in
the control file, and then data_checksums and LocalDataChecksumVersion
in the backends.

Would it be possible to "unify" the latter two? That would also mean we
don't have the duplicate constants like PG_DATA_CHECKSUM_VERSION and
DATA_CHECKSUM_VERSION. Or why do we need that?

4) v20250309-0004-make-progress-reporting-work.patch
----------------------------------------------------

The progress reporting got "mostly disabled" in an earlier version, due
to issues with the bgworkers. AFAICS the issue is the "status" row can
be updated only by a single process, which does not quite work with the
launcher + per-db workers architecture.

I've considered a couple different approaches:

a) updating the status only from the launcher

This is mostly what CREATE INDEX does with parallel builds, and there
it's mostly sufficient. But for checksums it'd mean we only have the
number of databases to process/done, and that seems unsatisfactory,
considering large clusters often have only a single large database. So
not good enough, IMHO.

b) allowing workers to update the status row, created by the launcher

I guess this would be better, we'd know the relations/blocks counts. And
I haven't tried coding this, but there would need to be some locking so
that the workers don't overwrite increments from other workers, etc.

But I don't think it'd work nicely with parallel per-db workers (which
we don't do now, but we might).

c) having one status entry per worker

I ended up doing this, it didn't require any changes to the progress
infrastructure, and it will work naturally even with multiple workers.
There will always be one row for launcher status (which only has the
number of databases total/done), and then one row per worker, with
database-level info (datid, datname, #relations, #blocks).

I removed the "DONE" phase, because that's right before the launcher
exists, and I don't think we have that for similar cases. And I added
"waiting on checkpoint" state, because that's often a long wait when the
launcher seems to do nothing, so it seems useful to communicate the
reason for that wait.

5) v20250309-0005-update-docs.patch
-----------------------------------

Minor tweaks to docs, to reflect the changes to the progress reporting
changes, and also some corrections (no resume after restart, ...).

So far this passed all my tests - both chekc-world and stress testing
(no failures / assert crashes, ...). One thing that puzzles me is I
wasn't able to reproduce the failures reported in [1] - not even with
just the rebase + minimal fixes (0001 + 0002). My best theory is this
is somehow machine-specific, and my laptop is too slow or something.
I'll try with the machine I used before once it gets available.

regards

[1]
https://www.postgresql.org/message-id/e4dbcb2c-e04a-4ba2-bff0-8d979f55960e%40vondra.me

--
Tomas Vondra

Attachment	Content-Type	Size
v20250309-0001-Online-enabling-and-disabling-of-data-chec.patch	text/x-patch	137.3 KB
v20250309-0002-simple-post-rebase-fixes.patch	text/x-patch	2.6 KB
v20250309-0003-sync-the-data_checksums-GUC-with-the-local.patch	text/x-patch	5.0 KB
v20250309-0004-make-progress-reporting-work.patch	text/x-patch	10.6 KB
v20250309-0005-update-docs.patch	text/x-patch	8.4 KB

In response to

Re: Changing the state of data checksums in a running cluster at 2024-12-11 04:47:08 from Michael Paquier

Responses

Re: Changing the state of data checksums in a running cluster at 2025-03-09 23:35:29 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nikita Malakhov	2025-03-09 18:26:47	Re: Considering fractional paths in Append node
Previous Message	Tom Lane	2025-03-09 17:51:47	Re: AIO v2.5