From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-03-04 05:13:17 |
Message-ID: | CANWCAZb7LE8rXx1z9TSjMQFFgQG=G=C7xOBOnyYWz797iLUALQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 4, 2025 at 5:41 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
> Some feedback on v11:
>
> if ((exx[2] & (1 << 20)) != 0) /* SSE 4.2 */
> {
> pg_comp_crc32c = pg_comp_crc32c_sse42;
> #ifdef USE_PCLMUL_WITH_RUNTIME_CHECK
> if ((exx[2] & (1 << 1)) != 0) /* PCLMUL */
> pg_comp_crc32c = pg_comp_crc32c_pclmul;
> #endif
> }
> #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
> else
> pg_comp_crc32c = pg_comp_crc32c_sb8;
> #endif
>
> Is the #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK at the right place? Shouldn’t it guard SSE4.2 function pointer assignment?
Without this guard, SSE builds will complain during link time of an
undefined reference to pg_comp_crc32c_sb8. We could instead just build
that file everywhere for simplicity, but it'll just take up space and
never get called. (Maybe that's okay because with the
runtime-branching approach we experimented with earlier, it would
always have to be built.)
> When building with meson, it looks like we build with -O2 and that is not enough for the compiler to unroll the SSE42 CRC32C loop.It requires -O3 or -O2 with -funroll-loops (see https://gcc.godbolt.org/z/4Eaq981aT) Perhaps we should check disassembly to see if the unroll is really happening on constant input?
That example uses 32 bytes -- fiddling with it a bit, for 31 or less
it gets unrolled. The case we care about most is currently 20 bytes.
We don't want to force unrolling loops in the general case -- that's
normally used for longer input and this is for short input.
> Also, the reason you have pg_comp_crc32c_sse42_inline defined separately in a header file is because you want to (a) inline the function and (b) unroll for constant inputs. Couldn't both of these be achieved by adding function __attribute__((always_inline)) on the pg_comp_crc32c_sse42 function with the added -funroll-loops compiler flag?
And (c) prevent it from being inlined in builds that need a runtime check.
I briefly tried the attribute approach and it doesn't work for me. If
you can get it to work, go ahead and share how that's done, but keep
in mind that we're not gcc/clang only -- it also has to work for
MSVC's "__forceinline"...
--
John Naylor
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2025-03-04 05:34:14 | Re: proposal - plpgsql - support standard syntax for named arguments for cursors |
Previous Message | John Naylor | 2025-03-04 05:09:09 | Re: Improve CRC32C performance on SSE4.2 |