From: | Jim Nasby <jim(at)nasby(dot)net> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enabling Checksums |
Date: | 2013-03-23 04:19:51 |
Message-ID: | 514D2D67.2070404@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I realize Simone relented on this, but FWIW...
On 3/16/13 4:02 PM, Simon Riggs wrote:
> Most other data we store doesn't consist of
> large runs of 0x00 or 0xFF as data. Most data is more complex than
> that, so any runs of 0s or 1s written to the block will be detected.
...
It's not that uncommon for folks to have tables that have a bunch of int[2,4,8]s all in a row, and I'd bet it's not uncommon for a lot of those fields to be zero.
> Checksums are for detecting problems. What kind of problems? Sporadic
> changes of bits? Or repeated errors. If we were trying to trap
> isolated bit changes then CRC-32 would be appropriate. But I'm
> assuming that whatever causes the problem is going to recur,
That's opposite to my experience. When we've had corruption events we will normally have one to several blocks with problems how up essentially all at once. Of course we can't prove that all the corruption happened at exactly the same time, but I believe it's a strong possibility. If it wasn't exactly the same time it was certainly over a span of minutes to hours... *but* we've never seen new corruption occur after we start an investigation (we frequently wait several hours for the next time we can take an outage without incurring a huge loss in revenue). That we would run for a number of hours with no additional corruption leads me to believe that whatever caused the corruption was essentially a "one-time" [1] event.
[1] One-time except for the fact that there were several periods where we would have corruption occur in 12 or 6 month intervals.
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2013-03-23 04:26:27 | Re: Enabling Checksums |
Previous Message | Jim Nasby | 2013-03-23 04:04:27 | Re: Page replacement algorithm in buffer cache |