Re: Improve CRC32C performance on SSE4.2

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-03-11 11:13:05
Message-ID: CANWCAZa+dtOQ50V6s85jR-B53P8MsMcGiZ9WGLAGEGiE_TtyGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 11, 2025 at 4:47 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> Overall, I wish we could avoid splitting things into separate files and
> >> adding more header file gymnastics, but maybe there isn't much better we
> >> can do without overhauling the CPU feature detection code.
> >
> > I wanted to make an attempt to make this aspect nicer. v13-0002
> > incorporates deliberately compact and simple loops for inlined
> > constant input into the dispatch function, and leaves the existing
> > code alone. This avoids code churn and saves vertical space in the
> > copied code. It needs a bit more commentary, but I hope this is a more
> > digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> > be simpler if we can always assume non-constant input can go through a
> > function pointer.
>
> That is certainly more readable. FWIW I think it would be entirely
> reasonable to replace the pg_crc32c_sse42.c implementation with a call to
> this new pg_comp_crc32c_dispatch() function. Of course, you'd have to
> split things up like:
> [snip]

That could work as well. I'm thinking if we do PMULL on Arm, it might
be advantageous to keep the inline path and function paths with
distinct coding -- because of the pickier alignment on that platform,
it might not be worth pre-aligning the pointer to 8 bytes for a
20-byte constant input.

I've gone ahead and added the generated AVX-512 algorithm in v14-0005,
and added the build support and some of the runtime support from Paul
and Raghuveer's earlier patches in 0006-7. It passes CI, but I'll have
to arrange access to other hardware to verify the runtime behavior. I
think the Meson support is most of the way there, but it looks like
configure.ac got whacked around cosmetically quite a bit. If we feel
it's time to refactor things there, we'll want to split that out. In
any case, for autoconf I've pretty much kept the earlier work for now.

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v14-0004-Always-do-runtime-check-for-x86-to-simplify-PCLM.patch text/x-patch 5.0 KB
v14-0008-Temp-fixup-build-of-benchmark-on-Windows.patch text/x-patch 1.1 KB
v14-0007-AVX-512-CRC-autoconf.patch text/x-patch 18.1 KB
v14-0005-Add-runtime-support-for-AVX-512-CRC.patch text/x-patch 2.8 KB
v14-0006-AVX-512-CRC-Meson.patch text/x-patch 7.6 KB
v14-0002-Inline-CRC-computation-for-fixed-length-input.patch text/x-patch 1.8 KB
v14-0003-Improve-CRC32C-performance-on-x86_64.patch text/x-patch 8.1 KB
v14-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch text/x-patch 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2025-03-11 11:32:09 Re: Reducing the log spam
Previous Message Amul Sul 2025-03-11 11:08:20 Re: bogus error message for ALTER TABLE ALTER CONSTRAINT