From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-04-01 10:33:02 |
Message-ID: | CANWCAZZ_+65KW3qUks2y15142zLYLfx0JK-wWWCJziDkR971Eg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 27, 2025 at 2:55 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
>
> Hello John,
>
> v15 LGTM. Couple of minor comments:
>
> > I'm leaning towards a length limit for v15-0001 so that inlined instructions are
> > likely to be unrolled. Aside from lack of commit message, I think that one is ready
> > for commit soon-ish.
>
> +1
Thanks for looking! This has been committed.
> > I'm feeling pretty good about 0002, but since there is still room for cosmetic
> > fiddling, I want to let it sit for a bit longer.
>
> (1) zmm_regs_available() in pg_crc32c_sse42_choose.c is a duplicate of pg_popcount_avx512.c but perhaps that’s fine for now. I will propose a consolidated SIMD runtime check in a follow up patch.
Yeah, I was thinking a small amount of duplication is tolerable.
As far as possible better abstraction for next cycle, I went looking
and found this, which has some features we've talked about:
https://zolk3ri.name/cgit/cpudetect/tree/cpudetect.h
> (2) Might be apt to rename pg_crc32c_sse42*.c to pg_crc32c_x86*.c since they contain both sse42 and avx512 versions.
The name is now not quite accurate, but it's not exactly misleading
either. I'm leaning towards keeping it the same, so for now I've just
updated the header comment.
For v16, I made another pass through made some more mostly superficial
adjustments:
- copied rewritten Meson comment to configure.ac
- added some more #include guards out of paranoia
- added tests with longer lengths that exercise the new paths
- adjusted configure / meson messages for consistency
- changed not-quite-accurate wording about "AVX-512 CRC instructions"
- used "PCLMUL" only when talking about specific intrinsics and prefer
"AVX-512" elsewhere, to head off potential future confusion with Arm
PMULL.
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v16-0001-Improve-CRC32C-performance-on-recent-x86_64-plat.patch | application/x-patch | 25.6 KB |
v16-0002-Add-debug-for-CI-XXX-not-for-commit.patch | application/x-patch | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2025-04-01 10:34:53 | Re: AIO writes vs hint bits vs checksums |
Previous Message | Andrew Dunstan | 2025-04-01 10:32:34 | Re: [PATCH] Fix build on MINGW on ARM64 |