Re: Improve CRC32C performance on SSE4.2

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-03-03 19:11:15
Message-ID: Z8X-0wg7wXztjMQ2@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 28, 2025 at 07:11:29PM +0700, John Naylor wrote:
> 0002: For SSE4.2 builds, arrange so that constant input uses an
> inlined path so that the compiler can emit unrolled loops anywhere.
> This is particularly important for the WAL insertion lock, so this is
> possibly committable on its own just for that.

Nice.

> 0004: the PCLMUL path for SSE4.2 builds. This uses a function pointer
> for long-ish input and the same above inlined path for short input
> (whether constant or not). So it gets the best of both worlds.

I spent some time staring at pg_crc32.h with all these patches applied, and
IIUC it leads to the following behavior:

* For compiled-in SSE 4.2 builds, we branch based on the length. For
smaller inputs, we are using an inlined version of the SSE 4.2 code.
For larger inputs, we call a function pointer so that we can potentially
use the PCLMUL version. This could potentially lead to a small
regression for machines with SSE 4.2 but not PCLMUL, but that may be
uncommon enough at this point to not worry aobut.

* For runtime-check SSE 4.2 builds, we choose slicing-by-8, SSE 4.2, or
SSE 4.2 with PCLMUL, and we always use a function pointer.

The main question I have is whether we can simplify this by always using a
runtime check and by inlining slicing-by-8 for small inputs. That would be
dependent on the performance of slicing-by-8 and SSE 4.2 being comparable
for small inputs.

Overall, I wish we could avoid splitting things into separate files and
adding more header file gymnastics, but maybe there isn't much better we
can do without overhauling the CPU feature detection code.

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-03-03 19:16:09 Re: SQL:2023 JSON simplified accessor support
Previous Message Masahiko Sawada 2025-03-03 19:06:39 Re: Make COPY format extendable: Extract COPY TO format implementations