From: | "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | RE: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date: | 2024-08-27 20:42:14 |
Message-ID: | BL1PR11MB5304DE16706199160F42F7AADC942@BL1PR11MB5304.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Things like sizeof() and offsetof() are known at compile time, so the compiler
> will recognize when a condition is always true or false and optimize it out
> accordingly. In cases where the value cannot be known at compile time,
> checking the length in the macro and dispatching to a different
> implementation may still be advantageous, especially when the different
> implementation doesn't involve function pointers.
Ok, multiple issues resolved and have new numbers:
1) Implemented the new COMP_CRC32 macro with the comparison and choice of avx512 vs. SSE42 at compile time for static structures.
2) You were right about the baseline numbers, it seems that the binaries were compiled with the direct call version of the SSE 4.2 CRC implementation thus avoiding the function pointer. I rebuilt with USE_SSE42_CRC32C_WITH_RUNTIME_CHECK for the numbers below.
3) ran through all the tests again and ended up with no regression (meaning run sets would fall either 0.5% below or 1.5% above the baseline and the margin of error was MUCH tighter this time at ~3%. :)
New Table of Rates (looks correct with fixed font width) below:
+------------------+----------------+----------------+------------------+-------+------+
| Rate in bytes/us | SDP (SPR) | m6i | m7i | | |
+------------------+----------------+----------------+------------------+ Multi-| |
| higher is better | SSE42 | AVX512 | SSE42 | AVX512 | SSE42 | AVX512 | plier | % |
+==================+=================+=======+========+========+========+=======+======+
| AVG Rate 64-8192 | 10,095 | 82,101 | 8,591 | 38,652 | 11,867 | 83,194 | 6.68 | 568% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+
| AVG Rate 64-255 | 9,034 | 9,136 | 7,619 | 7,437 | 9,030 | 9,293 | 1.01 | 1% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+
* With a data profile of 99% buffer sizes <256 bytes the improvement is still 6% and will not regress (except withing the margin of error)!
* There is not a regression anymore (previously showing a 14% regression).
Thanks for the pointers!!!
Paul
Attachment | Content-Type | Size |
---|---|---|
0001-v4-Refactor-Move-all-HW-checks-to-common-file.patch | application/octet-stream | 16.3 KB |
0002-v4-Feat-Add-support-for-the-SIMD-AVX-512-crc32c-algorit.patch | application/octet-stream | 40.7 KB |
0003-v4-Feat-New-COMP_CRC32C-macro-for-AVX512-simplify-code-.patch | application/octet-stream | 7.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-08-27 21:01:58 | Re: Partitioned tables and [un]loggedness |
Previous Message | Tom Lane | 2024-08-27 20:15:35 | Re: allowing extensions to control planner behavior |