RE: Improve CRC32C performance on SSE4.2

From: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: RE: Improve CRC32C performance on SSE4.2
Date: 2025-02-26 20:53:06
Message-ID: PH8PR11MB8286B017901C2C30665B106CFBC22@PH8PR11MB8286.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Attached v10 with minor changes based on the feedback.

> Thanks, I think this is a good direction. Some comments:
>
> #if defined(HAVE__GET_CPUID)
> __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID)
> __cpuid(exx, 1); #endif

Oh yeah good catch. Fixed in v10.

> + }
> + /* avx512 os support */
> + if (zmm_regs_available()) {
> + return;
> + }
>
> // BTW, I do like the gating here that reduces the number of places that have to
> know about zmm and xsave. (Side note: shouldn't that be "if
> !(zmm_regs_available())"?)

Yup, good catch again 😊

> What do you think?

Yup, those look correct to me. Fixed them in v10.

> + PG_CPU_FEATURE_PCLMUL = 3,
> [...]
> + PG_CPU_FEATURE_ARMV8_CRC32C = 100,
>
> I'm not a fan of exposing these architecture-specific details to places that consult
> the capabilities. That requires things like
>
> +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
> [...]
> +#define PGCPUCAP_CRC32C
> pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
>
> I'd prefer to have 1 capability <-> one place to check at runtime for architectures
> that need that, and to keep architecture details private to the initialization step. .
> Even for things that test for which function pointer to use, I think it's a cleaner
> interface to look at one thing.

Isn't that one thing currently pg_cpu_have(FEATURE)? The reason we have the additional PGCPUCAP_CRC32C is because the dispatch for CRC32C is currently defined in the header file common to both ARM and SSE42. We should ideally have separate dispatch for ARM and x86 in their own files (the way it is in the main branch). This also makes it easier to add more implementations in the future without having to make the dispatch function work for both ARM and x86.

> If we're going to have a single file for the init step, we don't need this -- we'd just
> have a different definition of
> pg_cpucap_initialize() in each part, with a default that only adds the "init" slot:
>
> #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) ||
> defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
>
> <cpuid checks>
>
> #elif defined(__arm__) || defined(__arm) ||
> defined(__aarch64__) || defined(_M_ARM64)
>
> <for now only pg_crc32c_armv8_available()>
>
> #else
> <only init>
> #endif

Makes sense. Got rid of it in v10.

Raghuveer

Attachment Content-Type Size
v10-0001-Dispatch-CRC-computation-by-branching-rather-tha.patch application/octet-stream 12.6 KB
v10-0002-Rename-CRC-choose-files-to-cpucap-files.patch application/octet-stream 9.3 KB
v10-0003-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch application/octet-stream 6.5 KB
v10-0004-Improve-CRC32C-performance-on-x86_64.patch application/octet-stream 9.3 KB
v10-0005-Move-all-cpuid-checks-to-one-location.patch application/octet-stream 20.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-26 20:57:44 Re: Statistics Import and Export
Previous Message Dmytro Astapov 2025-02-26 20:44:50 Re: Relaxing constraints on BitmapAnd eligibility?