From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Block-level CRC checks |
Date: | 2009-12-01 11:35:42 |
Message-ID: | 200912011135.nB1BZgs15378@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs wrote:
> The way we handle torn page corruptions *hides* actual corruptions from
> us. The frequency of true positives and false positives is important
> here. If the false positive ratio is very small, then reporting them is
> not a problem because of the benefit we get from having spotted the true
> positives. Some convicted murderers didn't do it, but that is not an
> argument for letting them all go free (without knowing the details). So
> we need to know what the false positive ratio is before we evaluate the
> benefit of either reporting or non-reporting possible corruption events.
>
> When do you think torn pages happen? Only at crash, or other times also?
> Do they always happen at crash? Are there ways to re-check a block that
> has suffered a hint-related torn page issue? Are there ways to isolate
> and minimise the reporting of false positives? Those are important
> questions and this is not black and white.
>
> If the *only* answer really is we-must-WAL-log everything, then that is
> the answer, as an option. I suspect that there is a less strict
> possibility, if we question our assumptions and look at the frequencies.
>
> We know that I have no time to work on this; I am just trying to hold
> open the door to a few possibilities that we have not fully considered
> in a balanced way. And I myself am guilty of having slammed the door
> previously. I encourage development of a way forward based upon a
> balance of utility.
I think the problem boils down to what the user response should be to a
corruption report. If it is a torn page, it would be corrected and the
user doesn't have to do anything. If it is something that is not
correctable, then the user has corruption and/or bad hardware. I think
the problem is that the existing proposal can't distinguish between
these two cases so the user has no idea how to respond to the report.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-12-01 11:50:57 | Re: CommitFest status/management |
Previous Message | Tsutomu Yamada | 2009-12-01 11:25:56 | [PATCH] Windows x64 |