Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>
Cc: "Sterrett, Matthew" <matthewsterrett2(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date: 2025-01-22 09:48:59
Message-ID: CANWCAZbkt89_fVAaCAGBMznwA_xh=2Ci5q4GZytZHKjZAEjCRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2025 at 12:46 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
>
> Is there any additional feedback for this patch?

Hi Raghuveer,

I raised one question and one concern upthread. I will repeat them
here for convenience.

#1 - The choice of AVX-512. There is no such thing as a "CRC
instruction operating on 8 bytes", and the proposed algorithm is a
multistep process using carryless multiplication and requiring at
least 256 bytes of input. The Chromium sources cited as the source for
this patch also contain an implementation using 128-bit instructions,
and which only requires at least 64 bytes of input. Is there a reason
that not tested or proposed as well? That would be much easier to
read/maintain, work on more systems, and might give a speed boost on
smaller inputs. These are useful properties to have.

https://github.com/chromium/chromium/blob/main/third_party/zlib/crc32_simd.c#L215

#2 - The legal status of the algorithm from following Intel white
paper, which is missing from its original location, archived here:

https://web.archive.org/web/20220802143127/https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf

This algorithm is the most portable and can in fact be coded with
plain C, no additional instructions. The only disadvantage is that
with pure C it's only useful on input with hundreds of bytes. But that
limitation is not that different from the AVX-512 proposal in this
regard.

My question on this paper is about this passage:

"The basic concepts in this paper are derived from and explained in detail in
the patents and pending applications [4][5][6]."
...
[4] Determining a Message Residue, Gopal et al. United States Patent 7,886,214
[5] Determining a Message Residue Gueron et al. United States Patent Application
20090019342
[6] Determining a Message Residue Gopal et al. United States Patent Application
20090158132

Looking at Linux kernel sources, it seems a patch using this technique
was contributed by Intel over a decade ago:

https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-asm_64.S

...so I'm unclear if these patents are applicable to software
implementations. They also seem to be expired, but I am not a lawyer.
Could you look into this please? Even if we do end up with AVX-512,
this would be a good fallback.

--
John Naylor
Amazon Web Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Frédéric Yhuel 2025-01-22 09:56:43 Re: doc: explain pgstatindex fragmentation
Previous Message Dave Page 2025-01-22 09:25:30 Re: Windows: openssl & gssapi dislike each other