From: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | "Sterrett, Matthew" <matthewsterrett2(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | RE: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date: | 2025-01-24 20:34:39 |
Message-ID: | PH8PR11MB828671F385396D8736465528FBE32@PH8PR11MB8286.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi John,
Thanks for your summary and here are responses:
> #1 - The choice of AVX-512. There is no such thing as a "CRC instruction operating
> on 8 bytes", and the proposed algorithm is a multistep process using carryless
> multiplication and requiring at least 256 bytes of input. The Chromium sources
> cited as the source for this patch also contain an implementation using 128-bit
> instructions, and which only requires at least 64 bytes of input. Is there a reason
> that not tested or proposed as well? That would be much easier to read/maintain,
> work on more systems, and might give a speed boost on smaller inputs. These are
> useful properties to have.
>
> https://github.com/chromium/chromium/blob/main/third_party/zlib/crc32_simd
> .c#L215
Agreed. postgres already has the SSE42 version pg_comp_crc32c_sse42, but I didn’t
realize it uses the crc32 instruction which processes only 8 bytes at a time. This can
certainly be upgraded to process 64bytes at a time and should be faster. Since most
of the AVX-512 stuff is almost ready, I propose to do this in a follow up patch immediately.
Let me know if you disagree. The AVX512 version processes 256 bytes at a time and will
most certainly be faster than the improved SSE42 version, which is why the chromium
library has both AVX512 and SSE42.
>
> #2 - The legal status of the algorithm from following Intel white paper, which is
> missing from its original location, archived here:
>
> https://web.archive.org/web/20220802143127/https://www.intel.com/content/
> dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-
> instruction-paper.pdf
>
> https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-
> asm_64.S
>
> ...so I'm unclear if these patents are applicable to software implementations.
> They also seem to be expired, but I am not a lawyer.
> Could you look into this please? Even if we do end up with AVX-512, this would be
> a good fallback.
Given that SSE42 is pretty much available in all x86 processors at this point, do we need a
fallback C version specially after we improve the SSE42 version.
Raghuveer
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-01-24 20:43:05 | Re: Eagerly scan all-visible pages to amortize aggressive vacuum |
Previous Message | Dave Page | 2025-01-24 20:24:31 | Re: Windows: openssl & gssapi dislike each other |