Offline activation of checksums via standby switchover (was: Online checksums patch - once again)

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Magnus Hagander <magnus(at)hagander(dot)net>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Offline activation of checksums via standby switchover (was: Online checksums patch - once again)
Date: 2021-02-10 08:06:35
Message-ID: 8f193f949b39817b9c642623e1fe7ccb94137ce4.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Am Mittwoch, den 10.02.2021, 15:06 +0900 schrieb Michael Paquier:
> On Tue, Feb 09, 2021 at 10:54:50AM +0200, Heikki Linnakangas wrote:
> > (I may have said this before, but) My overall high-level impression of this
> > patch is that it's really cmmplex for a feature that you use maybe once in
> > the lifetime of a cluster. I'm happy to review but I'm not planning to
> > commit this myself. I don't object if some other committer picks this up
> > (Magnus?).
>
> I was just looking at the latest patch set as a matter of curiosity,
> and I have a shared feeling.

I think this still would be a useful feature; not least for the online
deactivation - having to shut down the instance is sometimes just not an
option in production, even for just a few seconds.

However, there is also the shoot-the-whole-database-into-WAL (at least,
that is what happens, AIUI) issue which has not been discussed that much
either, the patch allows throttling, but I think the impact on actual
production workloads are not very clear yet.

> I think that this is a lot of complication in-core for what would be a
> one-time operation, particularly knowing that there are other ways to
> do it already with the offline checksum tool, even if that is more
> costly:
> - Involve logical replication after initializing the new instance with
> --data-checksums, or in an upgrade scenatio with pg_upgrade.

Logical replication is still somewhat unpractical for such a (possibly)
routine task, and I don't understand your pg_upgrade scenario, can
expand on that a bit?

> - Involve physical replication: stop the standby cleanly, enable
> checksums on it and do a switchover.

I would like to focus on this, so I changed the subject in order not to
derail the online acivation patch thread.

If this is something we support, then we should document it.

I have to admit that this possiblity escaped me when we first committed
offline (de)activation, it was brought to my attention via
https://twitter.com/samokhvalov/status/1281312586219188224 and the
following discussion.

So if we think this (to recap: shut down the standby, run pg_checksums
on it, start it up again, wait until it is back in sync, then
switchover) is a safe way to activate checksums on a streaming
replication setup, then we should document it I think. However, I have
only seen sorta hand-waiving on this so far and no deeper analysis of
what could possibly go wrong (but doesn't).

Anybody did some further work/tests on this and/or has something written
up to contribute to the documentation? Or do we think this is not
appropriate to document? I think once we agree this is safe, it is not
more complicated than the rsync-the-standby-after-pg_upgrade recipe we
did document.

> Another thing we could do is to improve pg_checksums with a parallel
> mode. The main design question would be how to distribute the I/O,
> and that would mean balancing at least across tablespaces.

Right. I thought about this a while ago, but didn't have time to work on
it so far.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2021-02-10 08:10:19 Re: repeated decoding of prepared transactions
Previous Message Michael Paquier 2021-02-10 08:04:04 Re: Preserve attstattarget on REINDEX CONCURRENTLY