From: | "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com> |
---|---|
To: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | RE: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date: | 2024-05-17 16:21:19 |
Message-ID: | BN0SPR01MB00084DB3E6F61E09F59533FFDCEE2@BN0SPR01MB0008.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, forgive the top-post but I have not seen any response to this post?
Thanks,
Paul
> -----Original Message-----
> From: Amonson, Paul D
> Sent: Wednesday, May 1, 2024 8:56 AM
> To: pgsql-hackers(at)lists(dot)postgresql(dot)org
> Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>; Shankaran, Akash
> <akash(dot)shankaran(at)intel(dot)com>
> Subject: Proposal for Updating CRC32C with AVX-512 Algorithm.
>
> Hi,
>
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased performance
> measured on 3 different Intel products. Details below. The AVX-512 algorithm
> in C is a port of the ISA-L library [1] assembler code.
>
> Workload call size distribution details (write heavy):
> * Average was approximately around 1,010 bytes per call
> * ~80% of the calls were under 256 bytes
> * ~20% of the calls were greater than or equal to 256 bytes up to the max
> buffer size of 8192
>
> The 256 bytes is important because if the buffer is smaller, it makes sense
> fallback to the existing implementation. This is because the AVX-512 algorithm
> needs a minimum of 256 bytes to operate.
>
> Using the above workload data distribution,
> at 0% calls < 256 bytes, a 841% improvement on average for crc32c
> functionality was observed.
> at 50% calls < 256 bytes, a 758% improvement on average for crc32c
> functionality was observed.
> at 90% calls < 256 bytes, a 44% improvement on average for crc32c
> functionality was observed.
> at 97.6% calls < 256 bytes, the workload's crc32c performance breaks-even.
> at 100% calls < 256 bytes, a 14% regression is seen when using AVX-512
> implementation.
>
> The results above are averages over 3 machines, and were measured on: Intel
> Saphire Rapids bare metal, and using EC2 on AWS cloud: Intel Saphire Rapids
> (m7i.2xlarge) and Intel Ice Lake (m6i.2xlarge).
>
> Summary Data (Saphire Rapids bare metal, AWS m7i-2xl, and AWS m6i-2xl):
> +---------------------+-------------------+-------------------+-------------------+---------
> -----------+
> | Rates in Bytes/us | Bare Metal | AWS m6i-2xl | AWS m7i-2xl |
> |
> | (Larger is Better) +---------+---------+---------+---------+---------+---------+
> Overall Multiplier |
> | | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 |
> |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
> | Numbers 256-8192 | 12,046 | 83,196 | 7,471 | 39,965 | 11,867 |
> 84,589 | 6.62 |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
> | Numbers 64 - 255 | 16,865 | 15,909 | 9,209 | 7,363 | 12,496 |
> 10,046 | 0.86 |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
> | Weighted Multiplier [*] | 1.44 |
> +-----------------------------+--------------------+
> There was no evidence of AVX-512 frequency throttling from perf data, which
> stayed steady during the test.
>
> Feedback on this proposed improvement is appreciated. Some questions:
> 1) This AVX-512 ISA-L derived code uses BSD-3 license [2]. Is this compatible
> with the PostgreSQL License [3]? They both appear to be very permissive
> licenses, but I am not an expert on licenses.
> 2) Is there a preferred benchmark I should run to test this change?
>
> If licensing is a non-issue, I can post the initial patch along with my Postgres
> benchmark function patch for further review.
>
> Thanks,
> Paul
>
> [0]
> https://www.researchgate.net/publication/263424619_Fast_CRC_computati
> on#full-text
> [1] https://github.com/intel/isa-l
> [2] https://opensource.org/license/bsd-3-clause
> [3] https://opensource.org/license/postgresql
>
> [*] Weights used were 90% of requests less than 256 bytes, 10% greater than
> or equal to 256 bytes.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-05-17 16:29:47 | Re: commitfest.postgresql.org is no longer fit for purpose |
Previous Message | Jelte Fennema-Nio | 2024-05-17 16:10:37 | Re: commitfest.postgresql.org is no longer fit for purpose |