Re: Offline enabling/disabling of data checksums

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-20 16:46:32
Message-ID: alpine.DEB.2.21.1903201712370.24709@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Michaël-san,

>> I think that a clear warning not to run any cluster command in parallel,
>> under pain of possible cluster corruption, and possibly other caveats about
>> replication, should appear in the documentation.
>
> I still have the following extra documentation in my notes:

Ok, it should have been included in the patch.

> + <refsect1>
> + <title>Notes</title>
> + <para>
> + When disabling or enabling checksums in a cluster of multiple instances,

ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an
Oracle cluster in postgres lingo, though.

> + it is recommended to stop all the instances of the cluster before doing
> + the switch to all the instances consistently.

I think that the motivation/risks should appear before the solution. "As
xyz ..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems
specific to "pg_rewind", are linked to the previous advice, which seems
rather to refer to streaming replication?

> + When using a cluster with
> + tools which perform direct copies of relation file blocks (for example
> + <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
> + lead to page corruptions in the shape of incorrect checksums if the
> + operation is not done consistently across all nodes. Destroying all
> + the standbys in a cluster first, enabling or disabling checksums on
> + the primary and finally recreate the cluster nodes from scratch is
> + also safe.
> + </para>
> + </refsect1>

Should not disabling in reverse order be safe? the checksum are not
checked afterwards?

> This sounds kind of enough for me but..

ISTM that the risks could be stated more clearly.

>> I also think that some mechanism should be used to prevent such occurence,
>> whether in this patch or another.
>
> I won't counter-argue on that.

This answer is ambiguous.

>> I still think that the control file update should be made *after* the data
>> directory has been synced, otherwise the control file could be updated while
>> some data files are not. It just means exchanging two lines.
>
> The parent path of the control file needs also to be flushed to make
> the change durable, just doing things the same way pg_rewind keeps the
> code simple in my opinion.

I do not understand. The issue I'm refering to is, when enabling:

- data files are updated in fs cache
- control file is updated in fs cache
* fsync is called
- updated control file gets written
- data files are being written...
but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.

>> If the conditional sync is moved before the file update, the file update
>> should pass do_sync to update_controlfile.
>
> For the durability of the operation, the current order is intentional.

See my point above: I think that this order can lead to cluster
corruption. If not, please be kind enough to try to explain in more
details why I'm wrong.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2019-03-20 16:58:21 Re: Optimze usage of immutable functions as relation
Previous Message Tom Lane 2019-03-20 16:43:11 Re: Automated way to find actual COMMIT LSN of subxact LSN