From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-02-13 11:46:10 |
Message-ID: | CANWCAZaC6E4orMJTGZpYLYQCPPfdHAZLtQvLdjkrhxg9G=mxrA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 13, 2025 at 4:18 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> I think the idea behind USE_SSE42_CRC32C is to avoid the function pointer
> overhead if possible. I looked at switching to always using runtime checks
> for this stuff, and we concluded that we'd better not [0].
>
> [0] https://postgr.es/m/flat/20231030161706.GA3011%40nathanxps13
For short lengths, I tried unrolling the loop into a switch statement,
as in the attached v5-0006 (the other new patches are fixes for CI).
That usually looks faster for me, but not on the length used under the
WAL insert lock. Usual caveat: Using small fixed-sized lengths in
benchmarks can be misleading, because branches are more easily
predicted.
It seems like for always using runtime checks we'd need to use
branching, rather than function pointers, as has been proposed
elsewhere.
master:
20
latency average = 3.622 ms
latency average = 3.573 ms
latency average = 3.599 ms
64
latency average = 7.791 ms
latency average = 7.920 ms
latency average = 7.888 ms
80
latency average = 8.076 ms
latency average = 8.140 ms
latency average = 8.150 ms
96
latency average = 8.853 ms
latency average = 8.897 ms
latency average = 8.914 ms
112
latency average = 9.867 ms
latency average = 9.825 ms
latency average = 9.869 ms
v5:
20
latency average = 4.550 ms
latency average = 4.327 ms
latency average = 4.320 ms
64
latency average = 5.064 ms
latency average = 4.934 ms
latency average = 5.020 ms
80
latency average = 4.904 ms
latency average = 4.786 ms
latency average = 4.942 ms
96
latency average = 5.392 ms
latency average = 5.376 ms
latency average = 5.367 ms
112
latency average = 5.730 ms
latency average = 5.859 ms
latency average = 5.734 ms
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v5-0005-Improve-CRC32C-performance-on-x86_64.patch | application/x-patch | 5.5 KB |
v5-0008-Allow-dev-test-to-build-on-Windows-for-CI-XXX-not.patch | application/x-patch | 891 bytes |
v5-0006-Unroll-tail.patch | application/x-patch | 2.2 KB |
v5-0007-Fix-32-bit-build.patch | application/x-patch | 981 bytes |
v5-0004-Run-pgindent-XXX-Some-lines-are-still-really-long.patch | application/x-patch | 4.7 KB |
v5-0002-Vendor-SSE-implementation-from-https-github.com-c.patch | application/x-patch | 3.5 KB |
v5-0003-Adjust-previous-commit-to-match-our-style-add-128.patch | application/x-patch | 2.8 KB |
v5-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch | application/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-02-13 11:57:03 | Re: NOT ENFORCED constraint feature |
Previous Message | Vladlen Popolitov | 2025-02-13 11:38:15 | Re: Windows meson build |