From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Xiang Gao <Xiang(dot)Gao(at)arm(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: CRC32C Parallel Computation Optimization on ARM |
Date: | 2024-12-11 07:08:58 |
Message-ID: | CANWCAZaHcxPPZW8eyxwXt8JAyV34KfsPfzq3pceEv0Pi-AsY3Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
>
> 1. I looked at a couple implementations of this idea, and found that
> the constants used in the carryless multiply are tied to the length of
> the blocks. With a lookup table we can do the 3-way algorithm on any
> portion of a full block length, rather than immediately fall to doing
> CRC serially. That would be faster on average. See for example
> https://github.com/komrad36/CRC/tree/master , but I don't think we
> actually have to fully unroll the loop like they do there.
>
> 2. With the above, we can use a larger full block size, and so on
> average less time would be spent in the carryless multiply. With that,
> we could possibly get away with an open coded loop in normal C rather
> than a new intrinsic (also found in the above repo). That would be
> more portable.
I added a port to x86 and poked at it, with the intent to have an easy
on-ramp to that at least accelerates computation of CRCs on FPIs.
The 0008 patch only worked on chunks of 1024 at a time. At that size,
the presence of hardware carryless multiplication is not that
important. I removed the hard-coded constants in favor of a lookup
table, so now it can handle anything up to 8400 bytes in a single
pass.
There are still some "taste" issues, but I like the overall shape here
and how light it was. With more hardware support, we can go much lower
than 1024 bytes, but that can be left for future work.
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v9-0002-Implement-interleaved-CRC-calculation-combined-vi.patch | text/x-patch | 13.3 KB |
v9-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch | text/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2024-12-11 07:31:45 | Avoid unnecessary wrapping for more complex expressions |
Previous Message | Masahiko Sawada | 2024-12-11 07:06:26 | Re: long-standing data loss bug in initial sync of logical replication |