Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

From: Andres Freund <andres(at)anarazel(dot)de>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>
Subject: Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date: 2024-12-14 15:24:08
Message-ID: mclb5qcupjfkzctmjkaygy4b7ecnimttwhqljenrustz7om4ed@iko77qgbs3bc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-14 12:08:57 +0700, John Naylor wrote:
> On Thu, Jun 13, 2024 at 2:37 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > It's hard to understand, but a nonetheless helpful page is
> > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for
> > crc32c:
> > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt
> > which lists
> > (0x8f6e37a0; 0x11edc6f41) <=> (0x82f63b78; 0x105ec76f1) {2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1} | gold | (*op) iSCSI; CRC-32C; CRC-32/4
> >
> > This cryptic notion AFAIU indicates that for our polynomial we can detect 2bit
> > errors up to a length of 2147483615 bytes, 3 bit errors up to 2147483615, 3
> > and 4 bit errors up to 5243, 5 and 6 bit errors up to 177, 7/8 bit errors up
> > to 47.
>
> One aspect of that cryptic notation that you seemed to have missed is
> "(*op)" -- explained as:
>
> *p - primitive polynomial. This has optimal length for HD=3, and good
> HD=2 performance above that length.
> *o - odd bit errors detected. This has a factor of (x+1) and detects
> all odd bit errors (implying that even number of bit errors have an
> elevated undetected error rate)
> *op - odd bit errors detected plus primitive. This is a primitive
> polynomial times (x+1). It has optimal length for HD=4, and detects
> all odd bit errors.
>
> This means it's not really a 32-bit checksum -- it's a 1-bit checksum
> plus a 31-bit checksum. The 1-bit checksum can detect any odd number
> of bit-flips. Do we really want to throw that property away?

I think it's pretty much irrelevant for our usecase.

What the WAL checksum needs to protect against are cases like a record
spanning >1 disk sectors or >1 OS pages and one of those sectors/pages not
having made it to disk, while the rest has made it (and thus shows old
contents).

That means we have to detect runs of "wrong content" that are *never* in the
single bit range (since sector boundaries never fall within a bit), *never*
within a 4 byte range (because that's what we IIRC align records to, and
again, sector boundaries don't fall within aligned 4 byte quantities).

Because the likely causes of failure are parts of the correct record and then
a tail or an intermittent long chunk (>= 1 sector) of wrong content, detecting
certain number of bit flips just doesn't help.

Bit flips are an important thing to detect and correct when they are something
that can happen in isolation. E.g. a bunch of interference in an ethernet
cable. Or the charge in an individual flash cell being a tiny bit above/below
some threshold. But that's just not what we have with WAL.

It's also worth noting that just about *all* permanent storage already has
applied sector-level checksums, protecting against (and correcting) bit flips
at that level.

> Sure, for an even number bitflips beyond a small number, we're left
> with the luck ordinary collisions, and CRC is not particularly great,

I.e. just about *all* failure scenarios for WAL.

> but for two messages of the same length, I'm also not sure it's all
> that bad, either

Our records rarely have the same length, no?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-12-14 15:40:42 Re: proposal: schema variables
Previous Message Junwang Zhao 2024-12-14 13:53:05 Re: SQL Property Graph Queries (SQL/PGQ)