RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

From: "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date: 2024-08-27 20:42:14
Message-ID: BL1PR11MB5304DE16706199160F42F7AADC942@BL1PR11MB5304.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Things like sizeof() and offsetof() are known at compile time, so the compiler
> will recognize when a condition is always true or false and optimize it out
> accordingly. In cases where the value cannot be known at compile time,
> checking the length in the macro and dispatching to a different
> implementation may still be advantageous, especially when the different
> implementation doesn't involve function pointers.

Ok, multiple issues resolved and have new numbers:

1) Implemented the new COMP_CRC32 macro with the comparison and choice of avx512 vs. SSE42 at compile time for static structures.
2) You were right about the baseline numbers, it seems that the binaries were compiled with the direct call version of the SSE 4.2 CRC implementation thus avoiding the function pointer. I rebuilt with USE_SSE42_CRC32C_WITH_RUNTIME_CHECK for the numbers below.
3) ran through all the tests again and ended up with no regression (meaning run sets would fall either 0.5% below or 1.5% above the baseline and the margin of error was MUCH tighter this time at ~3%. :)

New Table of Rates (looks correct with fixed font width) below:

+------------------+----------------+----------------+------------------+-------+------+
| Rate in bytes/us | SDP (SPR) | m6i | m7i | | |
+------------------+----------------+----------------+------------------+ Multi-| |
| higher is better | SSE42 | AVX512 | SSE42 | AVX512 | SSE42 | AVX512 | plier | % |
+==================+=================+=======+========+========+========+=======+======+
| AVG Rate 64-8192 | 10,095 | 82,101 | 8,591 | 38,652 | 11,867 | 83,194 | 6.68 | 568% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+
| AVG Rate 64-255 | 9,034 | 9,136 | 7,619 | 7,437 | 9,030 | 9,293 | 1.01 | 1% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+

* With a data profile of 99% buffer sizes <256 bytes the improvement is still 6% and will not regress (except withing the margin of error)!
* There is not a regression anymore (previously showing a 14% regression).

Thanks for the pointers!!!
Paul

Attachment Content-Type Size
0001-v4-Refactor-Move-all-HW-checks-to-common-file.patch application/octet-stream 16.3 KB
0002-v4-Feat-Add-support-for-the-SIMD-AVX-512-crc32c-algorit.patch application/octet-stream 40.7 KB
0003-v4-Feat-New-COMP_CRC32C-macro-for-AVX512-simplify-code-.patch application/octet-stream 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-27 21:01:58 Re: Partitioned tables and [un]loggedness
Previous Message Tom Lane 2024-08-27 20:15:35 Re: allowing extensions to control planner behavior