From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Jeff Davis <pgsql(at)j-davis(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Enabling Checksums |
Date: | 2012-11-14 16:46:54 |
Message-ID: | CA+Tgmoa6a7UgKKbrpFTSHUuDUiXt0_HwqGUHiwjW7rK3d9h+uQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 13, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What happens when you get an I/O failure on the checksum fork? Assuming
> you're using 8K pages there, that would mean you can no longer verify
> the integrity of between one and four thousand pages of data.
True... but you'll have succeeded in your central aim of determining
whether your hardware has crapped out. Answer: yes.
The existing code doesn't have any problem reporting back the user
those hardware failures which are reported to it by the OS. The only
reason for the feature is for the database to be able to detect
hardware failures in situations where the OS claims that everything is
working just fine.
> Not to mention the race condition problems associated with trying to be
> sure the checksum updates hit the disk at the same time as the data-page
> updates.
>
> I think you really have to store the checksums *with* the data they're
> supposedly protecting.
If torn pages didn't exist, I'd agree with you, but they do. Any
checksum feature is going to need to cope with the fact that, prior to
reaching consistency, there will be blocks on disk with checksums that
don't match, because 8kB writes are not atomic. We fix that by
unconditionally overwriting the possibly-torn pages with full-page
images, and we could simply update the checksum fork at the same time.
We don't have to do anything special to make sure that the next
checkpoint cycle successfully flushes both pages to disk before
declaring the checkpoint a success and moving the redo pointer; that
logic already exists.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-11-14 16:46:55 | Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY |
Previous Message | Tom Lane | 2012-11-14 16:39:44 | Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY |