Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

From: David Christensen <david(at)endpoint(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Date: 2017-03-09 22:39:04
Message-ID: 2732F6F2-B8E0-4DC7-A1CB-F883DE127C5C@endpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Mar 9, 2017, at 1:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david(at)endpoint(dot)com> wrote:
>> Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.
>
> Ah, sorry, I had missed that patch.
>
>> My design notes for the patch were submitted to the list with little comment; see: https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>>
>> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
>
> I think the concept of a checksum cycle might be overkill. It would
> be useful if we ever wanted to change the checksum algorithm, but I
> don't see any particular reason why we need to build infrastructure
> for that. If we wanted to change the checksum algorithm, I think it
> likely that we'd do that in the course of, say, widening it to 4 bytes
> as part of a bigger change in the page format, and this infrastructure
> would be too narrow to help with that.

I hear what you are saying, however a boolean on pg_class/pg_database to show if the relation in question is insufficient if we allow arbitrary enabling/disabling while enabling is in progress. In particular, if we disable checksums while the enabling was in progress and had only a boolean to indicate whether the checksums are complete for a relation there will have been a window when new pages in a relation will *not* be checksummed but the system table flag will indicate that the checksum is up-to-date, which is false. This would lead to checksum failures when those pages are encountered. Similar failures will occur as well when doing a pg_upgrade of an in-progress enabling. Saying you can’t disable/cancel the checksum process while it’s running seems undesirable too, as what happens if you have a system failure mid-process.

We could certainly avoid this problem by just saying that we have to start over with the entire cluster and process every page from scratch rather than trying to preserve/track state that we know is good, perhaps only dirtying buffers which have a non-matching checksum while we’re in the conversion state, but this seems undesirable to me, plus if we made it work in a single session we’d have a long-running snapshot open (presumably) which would wreak havoc while it processes the entire database (as someone had suggested by doing it in a single function that just hangs while it’s running)

I think we still need a way to short-circuit the process/incrementally update and note which tables have been checksummed and which ones haven’t. (Perhaps I could make the code which currently checks DataChecksumsEnabled() a little smarter, depending on the relation it’s looking at.)

As per one of Jim’s suggestions, I’ve been looking at making the data_checkum_state localized per database at postinit time, but really it probably doesn’t matter to anything but the buffer code whether it’s a global setting, a per-database setting or what.

> While I'm glad to see work finally going on to allow enabling and
> disabling checksums, I think it's too late to put this in v10. We
> have a rough rule that large new patches shouldn't be appearing for
> the first time in the last CommitFest, and I think this falls into
> that category. I also think it would be unwise to commit just the
> bits of the infrastructure that let us disable checksums without
> having time for a thorough review of the whole thing; to me, the
> chances that we'll make decisions that we later regret seem fairly
> high. I would rather wait for v11 and have a little more time to try
> to get everything right.

Makes sense, but to that end I think we need to have at least some agreement on how this should work ahead of time. Obviously it’s easier to critique existing code, but I don’t want to go out in left field of a particular approach is going to be obviously unworkable per some of the more experienced devs here.
--
David Christensen
End Point Corporation
david(at)endpoint(dot)com
785-727-1171

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Naytro Naytro 2017-03-09 22:40:09 Re: Performance issue after upgrading from 9.4 to 9.6
Previous Message Naytro Naytro 2017-03-09 22:38:46 Re: Performance issue after upgrading from 9.4 to 9.6