From: | Torsten Zuehlsdorff <mailinglists(at)toco-domains(dot)de> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Checksums by default? |
Date: | 2017-01-24 08:05:35 |
Message-ID: | dc9adb10-a026-6850-8ad3-e8d44a3629d4@toco-domains.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21.01.2017 19:37, Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>>> Because I see having checksums as, frankly, something we always should
>>> have had (as most other databases do, for good reason...) and because
>>> they will hopefully prevent data loss. I'm willing to give us a fair
>>> bit to minimize the risk of losing data.
>>
>> To be perfectly blunt, that's just magical thinking. Checksums don't
>> prevent data loss in any way, shape, or form. In fact, they can *cause*
>> data loss, or at least make it harder for you to retrieve your data,
>> in the event of bugs causing false-positive checksum failures.
>
> This is not a new argument, at least to me, and I don't agree with it.
I don't agree also. Yes, statistically it is more likely that checksum
causes data-loss. The IO is greater, therefore the disc has more to do
and breaks faster.
But the same is true for RAID: adding more disk increases the odds of an
disk-fallout.
So: yes. If you use checksums at a single disc its more likely to cause
problems. But if you managed it right (like ZFS for example) its an
overall gain.
>> What checksums can do for you, perhaps, is notify you in a reasonably
>> timely fashion if you've already lost data due to storage-subsystem
>> problems. But in a pretty high percentage of cases, that fact would
>> be extremely obvious anyway, because of visible data corruption.
>
> Exactly, and that awareness will allow a user to prevent further data
> loss or corruption. Slow corruption over time is a very much known and
> accepted real-world case that people do experience, as well as bit
> flipping enough for someone to write a not-that-old blog post about
> them:
>
> https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1
>
> A really nice property of checksums on pages is that they also tell you
> what data you *didn't* lose, which can be extremely valuable.
Indeed!
Greetings,
Torsten
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2017-01-24 08:18:24 | Re: Failure in commit_ts tap tests |
Previous Message | Torsten Zuehlsdorff | 2017-01-24 07:59:59 | Re: Checksums by default? |