From: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | RE: Improve CRC32C performance on SSE4.2 |
Date: | 2025-03-03 22:40:27 |
Message-ID: | PH8PR11MB8286F38A75AE71EE3A856E64FBC92@PH8PR11MB8286.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi John,
> You raised some interesting points, which deserve a thoughtful response. After
> sleeping on it, however I came to the conclusion that a sweeping change in
> runtime checks, with either of our approaches, has downsides and unresolved
> questions. Perhaps we can come back to it at a later time.
Sounds good to me. I did get derailed into something beyond the scope of this patch. I will make a separate proposal once this is merged.
> Here's what I came up with in v11:
Some feedback on v11:
if ((exx[2] & (1 << 20)) != 0) /* SSE 4.2 */
{
pg_comp_crc32c = pg_comp_crc32c_sse42;
#ifdef USE_PCLMUL_WITH_RUNTIME_CHECK
if ((exx[2] & (1 << 1)) != 0) /* PCLMUL */
pg_comp_crc32c = pg_comp_crc32c_pclmul;
#endif
}
#ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
else
pg_comp_crc32c = pg_comp_crc32c_sb8;
#endif
Is the #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK at the right place? Shouldn’t it guard SSE4.2 function pointer assignment?
/* WIP: configure checks */
#ifdef __x86_64__
#define USE_PCLMUL_WITH_RUNTIME_CHECK
#endif
Minor consideration: gcc 4.3 (released in 2011) is the only compiler that supports -msse4.2 and not -mpclmul. gcc >= 4.4 supports both. If you are okay causing a regression on gcc4.3, we could combine USE_PCLMUL_WITH_RUNTIME_CHECK with USE_SSE42_CRC32C_WITH_RUNTIME_CHECK into a single macro to reduce the number of #ifdef's in the codebase and simplify configure/meson compiler checks.
> 0001: same benchmark module as before
> 0002: For SSE4.2 builds, arrange so that constant input uses an inlined path so
> that the compiler can emit unrolled loops anywhere.
When building with meson, it looks like we build with -O2 and that is not enough for the compiler to unroll the SSE42 CRC32C loop. It requires -O3 or -O2 with -funroll-loops (see https://gcc.godbolt.org/z/4Eaq981aT) Perhaps we should check disassembly to see if the unroll is really happening on constant input?
Also, the reason you have pg_comp_crc32c_sse42_inline defined separately in a header file is because you want to (a) inline the function and (b) unroll for constant inputs. Couldn't both of these be achieved by adding function __attribute__((always_inline)) on the pg_comp_crc32c_sse42 function with the added -funroll-loops compiler flag?
Raghuveer
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-03-03 23:00:52 | Re: [PATCH] SVE popcount support |
Previous Message | Andres Freund | 2025-03-03 22:39:27 | Re: Adding NetBSD and OpenBSD to Postgres CI |