From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-02-26 05:22:25 |
Message-ID: | CANWCAZbG_pPFuLRaczYApBykaHCXQ-STAXO1=J6_4TRvy23Byg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 26, 2025 at 7:21 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
>
> > I agree it would be preferable to make a centralized check work.
>
> Here is my first stab at it. v9 is same as v8 + a commit to move all cpuid checks into one single place including the AVX512 popcount code. Any new feature that requires CPUID information can access that information with pg_cpu_have[FEATURE] defined in pg_cpucap.h and initialized in pg_cpucap.c. v8 also had a typo in configure files which caused a build failure. Its fixed in v9.
>
> Pretty sure the ARM code paths need some correction. Let me know what you think.
Thanks, I think this is a good direction. Some comments:
+static void pg_cpuid(int leaf, int subleaf, unsigned int* exx)
+{
+#if defined(HAVE__GET_CPUID_COUNT)
+ __get_cpuid_count(leaf, subleaf, &exx[0], &exx[1], &exx[2], &exx[3]);
+#elif defined(HAVE__CPUIDEX)
+ __cpuidex(exx, leaf, subleaf);
+#else
+#error cpuid instruction not available
+#endif
+}
Our current configure still looks for __get_cpuid and __cpuid. We
committed checking these new ones fairly recently, and they were
further gated by USE_AVX512_POPCNT_WITH_RUNTIME_CHECK. It seems like
here we should do something like the following, where "+" lines are
from the patch and other lines are mine:
+static void
+pg_cpucap_x86(void)
+{
+ unsigned int exx[4] = {0, 0, 0, 0};
#if defined(HAVE__GET_CPUID)
__get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
#elif defined(HAVE__CPUID)
__cpuid(exx, 1);
#endif
+ pg_cpucap[PG_CPU_FEATURE_SSE42] = (exx[2] & (1 << 20)) != 0;
+ pg_cpucap[PG_CPU_FEATURE_PCLMUL] = (exx[2] & (1 << 1)) != 0;
+ pg_cpucap[PG_CPU_FEATURE_POPCNT] = (exx[2] & (1 << 23)) != 0;
+ /* osxsave */
+ if ((exx[2] & (1 << 27)) == 0) {
+ return;
+ }
+ /* avx512 os support */
+ if (zmm_regs_available()) {
+ return;
+ }
// BTW, I do like the gating here that reduces the number of places
that have to know about zmm and xsave. (Side note: shouldn't that be
"if !(zmm_regs_available())"?)
+ /* reset for second cpuid call on leaf 7 to check extended avx512
support */
exx[4] = {0, 0, 0, 0};
#if defined(HAVE__GET_CPUID_COUNT)
__get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]);
#elif defined(HAVE__CPUIDEX)
__cpuidex(exx, 7, 0);
#endif
+ pg_cpucap[PG_CPU_FEATURE_AVX512F] = (exx[1] & (1 << 16)) != 0;
+ pg_cpucap[PG_CPU_FEATURE_AVX512BW] = (exx[1] & (1 << 30)) != 0;
+ pg_cpucap[PG_CPU_FEATURE_AVX512VPOPCNTDQ] = (exx[2] & (1 << 14)) != 0;
+
+}
What do you think?
-#define PGCPUCAP_INIT (1 << 0)
-#define PGCPUCAP_POPCNT (1 << 1)
-#define PGCPUCAP_VPOPCNT (1 << 2)
-#define PGCPUCAP_CRC32C (1 << 3)
-#define PGCPUCAP_CLMUL (1 << 4)
+enum pg_cpucap__
+{
+ PG_CPU_FEATURE_INIT = 0,
+ // X86
+ PG_CPU_FEATURE_SSE42 = 1,
+ PG_CPU_FEATURE_POPCNT = 2,
+ PG_CPU_FEATURE_PCLMUL = 3,
[...]
+ PG_CPU_FEATURE_ARMV8_CRC32C = 100,
I'm not a fan of exposing these architecture-specific details to
places that consult the capabilities. That requires things like
+#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
[...]
+#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
I'd prefer to have 1 capability <-> one place to check at runtime for
architectures that need that, and to keep architecture details private
to the initialization step. . Even for things that test for which
function pointer to use, I think it's a cleaner interface to look at
one thing.
+static void
+pg_cpucap_arch()
+{
+ /* WIP: configure checks */
+#ifdef __x86_64__
+ pg_cpucap_x86();
+#else // ARM:
+ pg_cpucap_arm();
+#endif
+}
If we're going to have a single file for the init step, we don't need
this -- we'd just have a different definition of
pg_cpucap_initialize() in each part, with a default that only adds the
"init" slot:
#if defined( __i386__ ) || defined(i386) || defined(_M_IX86) ||
defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
<cpuid checks>
#elif defined(__arm__) || defined(__arm) ||
defined(__aarch64__) || defined(_M_ARM64)
<for now only pg_crc32c_armv8_available()>
#else
<only init>
#endif
--
John Naylor
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-02-26 05:27:44 | Re: Fix logging for invalid recovery timeline |
Previous Message | Paul Jungwirth | 2025-02-26 05:15:43 | Re: SQL:2011 application time |