From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-02-17 10:58:01 |
Message-ID: | CANWCAZb-3PZMwG3LvG_TcfuTZm3aRY7LgY-bO9-9tso=z0FgXQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 13, 2025 at 5:19 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Feb 12, 2025 at 10:12:20PM +0000, Devulapalli, Raghuveer wrote:
> >> Well, I suspect the AVX-512 version will pretty much always need the runtime
> >> check given that its not available on a lot of newer hardware and requires a
> >> bunch of extra runtime checks (see pg_popcount_avx512.c). But it might be
> >> worth doing for PCLMUL. Otherwise, I think we'd have to leave out the PCLMUL
> >> optimizations if built with -msse4.2 -mpclmul because we don't want to regress
> >> existing -msse4.2 users with a runtime check.
> >
> > Sounds good to me. Although, users building with just -msse4.2 will now encounter an
> > an additional pclmul runtime check. That would be a regression unless they update to
> > building with both -msse4.2 and -mpclmul.
>
> My thinking was that building with just -msse4.2 would cause the existing
> SSE 4.2 implementation to be used (without the function pointer). That's
> admittedly a bit goofy because they'd miss out on the PCLMUL optimization,
> but things at least don't get any worse for them.
I tried using branching for the runtime check, and this looks like the
way to go:
- Existing -msse4.2 builders will still call directly, but inside the
function there is a length check and only for long input will it do a
runtime check for pclmul.
- This smooths the way for -msse4.2 (and the equivalent on Arm) to
inline calls with short constant input (e.g. WAL insert lock),
although I've not done that here.
- This can be a simple starting point for consolidating runtime
checks, as was proposed for popcount in the AVX-512 CRC thread, but
with branching my model was Andres' sketch here:
https://www.postgresql.org/message-id/20240731023918.ixsfbeuub6e76one%40awork3.anarazel.de
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v6-0002-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch | text/x-patch | 6.4 KB |
v6-0003-Improve-CRC32C-performance-on-x86_64.patch | text/x-patch | 8.3 KB |
v6-0001-Dispatch-CRC-computation-by-branching-rather-than.patch | text/x-patch | 19.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2025-02-17 11:12:21 | Re: [PATCH] Add CANONICAL option to xmlserialize |
Previous Message | Shubham Khanna | 2025-02-17 10:53:30 | Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size |