From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-02-12 13:06:35 |
Message-ID: | CANWCAZZZWQc3VQauZ2bCPxDxKpjKbV7z=zntGu3zD1HgT76Fdw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 12, 2025 at 4:34 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
> > On my machine that still regresses compared to master in that range (although by
> > not as much) so I still think 128 bytes is the right threshold.
>
> On my TGL, buffer sizes as small as 64 bytes see performance benefits.
Yeah, I'm guessing it's because newer chips will have better IPC. We
need to take care not to regress for hardware that's only 5-10 years
old.
Attached is v4:
0001: Same perf test, just in case
0002-4: I redid adding the implementation (without the single-block
loop and with 128-byte threshold), but split out the steps for
reference, as a model for possible ARM support in the future. These
will all be squashed on commit. The upstream code has very long lines,
even after running pgindent, so some may find that objectionable. We
could easily turn some commas into semicolons, and then it'll wrap
more nicely. I just wanted to change as little as possible for now. (I
also need to check if I need to put more license text here..)
0005: This has a fleshed-out draft commit message, but otherwise is
just the same configure/choose support as v3.
Some review comments:
1. Some of the comments that only mention SSE 4.2 in the compile- and
run-time checks need to be updated.
> > Okay, Nehalem is 17 years old, and the additional cpuid check would still work on
> > hardware 14-15 years old, so I think it's fine to bump the requirement for runtime
> > hardware support.
>
> Sounds good. I updated the runtime check to include PCLMULQDQ. New algorithm will run only on Westmere and newer CPU.
2. Unfortunately, there is another wrinkle that I failed to consider: If you
search the web for "VirtualBox pclmulqdq" you can see a few reports from not
very long ago that some hypervisors don't enable the CPUID for pclmul. I
don't know how big a problem that is in practice today, but it seems we
should actually have separate checks, with fallback. Sorry I didn't
think of this earlier.
3. Note: I left out the new test file from v3-0001. We should have
tests, but note we already have some CRC tests in
src/test/regress/sql/strings.sql -- let's put new ones there. Also,
for the longer strings we want to test, it's easier to read/verify to
use something like
SELECT crc32c(repeat('A', 128)::bytea);
Maybe it's sufficient to have 127, 128, 129 for lengths, and maybe a
couple more.
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v4-0002-Vendor-SSE-implementation-from-https-github.com-c.patch | text/x-patch | 3.5 KB |
v4-0005-Improve-CRC32C-performance-on-x86_64.patch | text/x-patch | 5.5 KB |
v4-0004-Run-pgindent-XXX-Some-lines-are-still-really-long.patch | text/x-patch | 4.7 KB |
v4-0003-Adjust-previous-commit-to-match-our-style-add-128.patch | text/x-patch | 2.8 KB |
v4-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch | text/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-02-12 13:37:04 | Re: Question about UpdateFullPageWrites() at the end of recovery. |
Previous Message | Alvaro Herrera | 2025-02-12 12:57:47 | Re: [PATCH] Optionally record Plan IDs to track plan changes for a query |