From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-03-10 08:48:31 |
Message-ID: | CANWCAZabmia25iu8Z_qRhLKoOV1VxhcSMkJuzDomQrA2RWdTUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> Overall, I wish we could avoid splitting things into separate files and
> adding more header file gymnastics, but maybe there isn't much better we
> can do without overhauling the CPU feature detection code.
I wanted to make an attempt to make this aspect nicer. v13-0002
incorporates deliberately compact and simple loops for inlined
constant input into the dispatch function, and leaves the existing
code alone. This avoids code churn and saves vertical space in the
copied code. It needs a bit more commentary, but I hope this is a more
digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
be simpler if we can always assume non-constant input can go through a
function pointer.
I've re-attached the modified perf test from v12 just in case anyone
wants to play with it (v13-0003), but named so that the CF bot can't
find it, since it breaks the tests in the original perf test (It's not
for commit anyway).
Adding back AVX-512 should be fairly mechanical, since Raghuveer and
Nathan have already done the work needed for that.
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v13-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch | text/x-patch | 6.4 KB |
v13-0002-Inline-CRC-computation-for-fixed-length-input.patch | text/x-patch | 1.8 KB |
v13-0003-Attempt-to-make-benchmark-more-sensitive-to-late.patch.nocfbot | application/octet-stream | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mahendra Singh Thalor | 2025-03-10 08:54:27 | change on_exit_nicely_list array to the dynamic array to increase slots at run time for pg_restore |
Previous Message | Anthonin Bonnefoy | 2025-03-10 08:41:56 | Re: Xact end leaves CurrentMemoryContext = TopMemoryContext |