Re: CRC32C Parallel Computation Optimization on ARM

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CRC32C Parallel Computation Optimization on ARM
Date: 2024-12-11 07:08:58
Message-ID: CANWCAZaHcxPPZW8eyxwXt8JAyV34KfsPfzq3pceEv0Pi-AsY3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
>
> 1. I looked at a couple implementations of this idea, and found that
> the constants used in the carryless multiply are tied to the length of
> the blocks. With a lookup table we can do the 3-way algorithm on any
> portion of a full block length, rather than immediately fall to doing
> CRC serially. That would be faster on average. See for example
> https://github.com/komrad36/CRC/tree/master , but I don't think we
> actually have to fully unroll the loop like they do there.
>
> 2. With the above, we can use a larger full block size, and so on
> average less time would be spent in the carryless multiply. With that,
> we could possibly get away with an open coded loop in normal C rather
> than a new intrinsic (also found in the above repo). That would be
> more portable.

I added a port to x86 and poked at it, with the intent to have an easy
on-ramp to that at least accelerates computation of CRCs on FPIs.

The 0008 patch only worked on chunks of 1024 at a time. At that size,
the presence of hardware carryless multiplication is not that
important. I removed the hard-coded constants in favor of a lookup
table, so now it can handle anything up to 8400 bytes in a single
pass.

There are still some "taste" issues, but I like the overall shape here
and how light it was. With more hardware support, we can go much lower
than 1024 bytes, but that can be left for future work.
--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v9-0002-Implement-interleaved-CRC-calculation-combined-vi.patch text/x-patch 13.3 KB
v9-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch text/x-patch 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-12-11 07:31:45 Avoid unnecessary wrapping for more complex expressions
Previous Message Masahiko Sawada 2024-12-11 07:06:26 Re: long-standing data loss bug in initial sync of logical replication