Re: Improve CRC32C performance on SSE4.2

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-02-17 10:58:01
Message-ID: CANWCAZb-3PZMwG3LvG_TcfuTZm3aRY7LgY-bO9-9tso=z0FgXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 13, 2025 at 5:19 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Feb 12, 2025 at 10:12:20PM +0000, Devulapalli, Raghuveer wrote:
> >> Well, I suspect the AVX-512 version will pretty much always need the runtime
> >> check given that its not available on a lot of newer hardware and requires a
> >> bunch of extra runtime checks (see pg_popcount_avx512.c). But it might be
> >> worth doing for PCLMUL. Otherwise, I think we'd have to leave out the PCLMUL
> >> optimizations if built with -msse4.2 -mpclmul because we don't want to regress
> >> existing -msse4.2 users with a runtime check.
> >
> > Sounds good to me. Although, users building with just -msse4.2 will now encounter an
> > an additional pclmul runtime check. That would be a regression unless they update to
> > building with both -msse4.2 and -mpclmul.
>
> My thinking was that building with just -msse4.2 would cause the existing
> SSE 4.2 implementation to be used (without the function pointer). That's
> admittedly a bit goofy because they'd miss out on the PCLMUL optimization,
> but things at least don't get any worse for them.

I tried using branching for the runtime check, and this looks like the
way to go:
- Existing -msse4.2 builders will still call directly, but inside the
function there is a length check and only for long input will it do a
runtime check for pclmul.
- This smooths the way for -msse4.2 (and the equivalent on Arm) to
inline calls with short constant input (e.g. WAL insert lock),
although I've not done that here.
- This can be a simple starting point for consolidating runtime
checks, as was proposed for popcount in the AVX-512 CRC thread, but
with branching my model was Andres' sketch here:

https://www.postgresql.org/message-id/20240731023918.ixsfbeuub6e76one%40awork3.anarazel.de

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v6-0002-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch text/x-patch 6.4 KB
v6-0003-Improve-CRC32C-performance-on-x86_64.patch text/x-patch 8.3 KB
v6-0001-Dispatch-CRC-computation-by-branching-rather-than.patch text/x-patch 19.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2025-02-17 11:12:21 Re: [PATCH] Add CANONICAL option to xmlserialize
Previous Message Shubham Khanna 2025-02-17 10:53:30 Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size