What exactly is our CRC algorithm?

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: What exactly is our CRC algorithm?
Date: 2014-10-08 19:13:46
Message-ID: 54358CEA.8080809@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Our CRC algorithm is a bit weird. It's supposedly CRC-32, with the same
polynomial as used in Ethernet et al, but it actually is not. The
comments refer to "Painless Guide to CRC Error Detection Algorithms" by
Ross N. Williams [1] (http://www.ross.net/crc/download/crc_v3.txt) but
I think it was implemented incorrectly.

As a test case, I used an input of a single zero byte. I calculated the
CRC using Postgres' INIT_CRC32+COMP_CRC32+FIN_CRC32, and compared with
various online CRC calculation tools and C snippets. The Postgres
algorithm produces the value 2D02EF72, while the correct one is
D202EF8D. The first and last byte are inverted. For longer inputs, the
values diverge, and I can't see any obvious pattern between the Postgres
and correct values.

There are many variants of CRC calculations, as explained in Ross's
guide. But ours doesn't seem to correspond to the reversed or reflected
variants either.

I compiled the code from Ross's document, and built a small test program
to test it. I used Ross's "reverse" lookup table, which is the same
table we use in Postgres. It produces this output:

Calculating CRC-32 (polynomial 04C11DB7) for a single zero byte:
D202EF8D 11010010000000101110111110001101 (simple)
2D02EF72 10101101000000101110111101110010 (lookup)
D202EF8D 11010010000000101110111110001101 (lookup reflected)

Hmm. So the simple, non-table driven, calculation gives the same result
as using the lookup table using the reflected lookup code. That's
expected; the lookup method is supposed to be the same, just faster.
However, using the "normal" lookup code, but with a "reflected" lookup
table, produces the same result as Postgres' algorithm. Indeed, that's
what we do in PostgreSQL. But AFAICS, that's an incorrect combination.
You're supposed to the non-reflected lookup table with the non-reflected
lookup code; you can't mix and match.

As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
correspond to any bit-by-bit CRC variant and polynomial. My math skills
are not strong enough to determine what the consequences of that are. It
might still be a decent checksum. Or not. I couldn't tell if the good
error detection properties of the normal CRC-32 polynomial apply to our
algorithm or not.

Thoughts? Attached is the test program I used for this.

- Heikki

Attachment Content-Type Size
crcmodel-1.tar.gz application/gzip 5.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-08 19:19:01 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Peter Geoghegan 2014-10-08 19:12:57 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}