Re: Improve CRC32C performance on SSE4.2

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-02-13 11:46:10
Message-ID: CANWCAZaC6E4orMJTGZpYLYQCPPfdHAZLtQvLdjkrhxg9G=mxrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 13, 2025 at 4:18 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> I think the idea behind USE_SSE42_CRC32C is to avoid the function pointer
> overhead if possible. I looked at switching to always using runtime checks
> for this stuff, and we concluded that we'd better not [0].
>
> [0] https://postgr.es/m/flat/20231030161706.GA3011%40nathanxps13

For short lengths, I tried unrolling the loop into a switch statement,
as in the attached v5-0006 (the other new patches are fixes for CI).
That usually looks faster for me, but not on the length used under the
WAL insert lock. Usual caveat: Using small fixed-sized lengths in
benchmarks can be misleading, because branches are more easily
predicted.

It seems like for always using runtime checks we'd need to use
branching, rather than function pointers, as has been proposed
elsewhere.

master:
20
latency average = 3.622 ms
latency average = 3.573 ms
latency average = 3.599 ms
64
latency average = 7.791 ms
latency average = 7.920 ms
latency average = 7.888 ms
80
latency average = 8.076 ms
latency average = 8.140 ms
latency average = 8.150 ms
96
latency average = 8.853 ms
latency average = 8.897 ms
latency average = 8.914 ms
112
latency average = 9.867 ms
latency average = 9.825 ms
latency average = 9.869 ms

v5:
20
latency average = 4.550 ms
latency average = 4.327 ms
latency average = 4.320 ms
64
latency average = 5.064 ms
latency average = 4.934 ms
latency average = 5.020 ms
80
latency average = 4.904 ms
latency average = 4.786 ms
latency average = 4.942 ms
96
latency average = 5.392 ms
latency average = 5.376 ms
latency average = 5.367 ms
112
latency average = 5.730 ms
latency average = 5.859 ms
latency average = 5.734 ms

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v5-0005-Improve-CRC32C-performance-on-x86_64.patch application/x-patch 5.5 KB
v5-0008-Allow-dev-test-to-build-on-Windows-for-CI-XXX-not.patch application/x-patch 891 bytes
v5-0006-Unroll-tail.patch application/x-patch 2.2 KB
v5-0007-Fix-32-bit-build.patch application/x-patch 981 bytes
v5-0004-Run-pgindent-XXX-Some-lines-are-still-really-long.patch application/x-patch 4.7 KB
v5-0002-Vendor-SSE-implementation-from-https-github.com-c.patch application/x-patch 3.5 KB
v5-0003-Adjust-previous-commit-to-match-our-style-add-128.patch application/x-patch 2.8 KB
v5-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch application/x-patch 6.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-02-13 11:57:03 Re: NOT ENFORCED constraint feature
Previous Message Vladlen Popolitov 2025-02-13 11:38:15 Re: Windows meson build