From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: CRC32C Parallel Computation Optimization on ARM |
Date: | 2024-12-11 16:54:27 |
Message-ID: | Z1nDwz1OubIgk9oX@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 11, 2024 at 02:08:58PM +0700, John Naylor wrote:
> I added a port to x86 and poked at it, with the intent to have an easy
> on-ramp to that at least accelerates computation of CRCs on FPIs.
>
> The 0008 patch only worked on chunks of 1024 at a time. At that size,
> the presence of hardware carryless multiplication is not that
> important. I removed the hard-coded constants in favor of a lookup
> table, so now it can handle anything up to 8400 bytes in a single
> pass.
>
> There are still some "taste" issues, but I like the overall shape here
> and how light it was. With more hardware support, we can go much lower
> than 1024 bytes, but that can be left for future work.
Nice. I'm curious how this compares to both the existing implementations
and the proposed ones that require new intrinsics. I like the idea of
avoiding new runtime and config checks, especially if the performance is
somewhat comparable for the most popular cases (i.e., dozens of bytes to a
few thousand bytes).
If we still want to add new intrinsics, would it be easy enough to add them
on top of this patch? Or would it require further restructuring?
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-12-11 17:08:36 | Re: CREATE SCHEMA ... CREATE DOMAIN support |
Previous Message | Nathan Bossart | 2024-12-11 16:40:04 | Re: Track the amount of time waiting due to cost_delay |