From: | Daniel Farina <daniel(at)heroku(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, Florian Pflug <fgp(at)phlo(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enabling Checksums |
Date: | 2013-04-18 10:43:43 |
Message-ID: | CAAZKuFZFGJn_j_RsHw6-dnqrW5y8JYDD4kM1YkKsRVym36ELOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 17, 2013 at 11:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-04-17 18:16:36 -0700, Daniel Farina wrote:
>> The original paper is often shorthanded "Castagnoli 93", but it exists
>> in the IEEE's sphere of influence and is hard to find a copy of.
>> Luckily, a pretty interesting survey paper discussing some of the
>> issues was written by Koopman in 2002 and is available:
>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
>> pedagolgical note, it's pretty interesting and accessible piece of
>> writing (for me, as someone who knows little of error
>> detection/correction) and explains some of the engineering reasons
>> that provoke such exercises.
>
> http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=231911&userType=inst
>
> There's also a koopman paper from 2004 thats interesting.
Having read the 2002 paper more, it seems that the current CRC32
doesn't have a whole lot going for it: CRC32C pretty much cleans its
clock across the board (I don't understand detected Hamming Distance
that seem greater than the information content of the message, e.g. HD
14 with 8 bit messages as seen in CRC32C: that's where CRC32 can "win").
CRC32C looks, all in all, the most flexible, because detection of
Hamming Distance 4 spans from 5244-131072 bits (the upper range of
which is a full 16KiB!) and there is superior Hamming Distance
detection on shorter messages up until the point where it seems like
the Hamming Distance able to be detected is larger than the message
size itself (e.g. HM 13 on an 8 bit message). I'm not sure if this is
an error in my understanding, or what.
Also, larger runs (16KB) are better served by CRC32C: even the
probably-best contender I can see (0xD419CC15) drops to Hamming
Distance 2-detection right after 65505 bits. CRC32C has the biggest
range at HD4, although Koopman 0xBA0DC66 comes close, gaining superior
Hamming distance detection for 178-16360 bits (the upper end of this
rnage is short of 2KiB by 3 bytes).
All in all, there is no reason I can see to keep CRC32 at all, vs
CRC32C on the basis of error detection alone, so putting aside all the
business about instruction set architecture, I think a software CRC32C
in a vacuum can be seen as a robustness improvement.
There may be polynomials that are not CRC32 or CRC32C that one might
view as having slightly better tradeoffs as seen in Table 1 of Koopman
2002, but it's kind of a stretch: being able to handle 8KB and 16KB as
seen in CRC32C at HD4 as seen in CRC32C is awfully compelling to me.
Koopman 0xBA0DC66B can admirably reach HD6 on a much larger range, up
to 16360 bytes, which is every so shy of 2KiB. Castagnoli 0xD419CC15
can, short of 8KB by 31 bits can detect HD 5.
Corrections welcome on my interpretations of Tbl 1.
From | Date | Subject | |
---|---|---|---|
Next Message | Ants Aasma | 2013-04-18 14:02:50 | Re: Enabling Checksums |
Previous Message | Dimitri Fontaine | 2013-04-18 09:05:15 | Re: event trigger API documentation? |