Re: CRC32C Parallel Computation Optimization on ARM

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CRC32C Parallel Computation Optimization on ARM
Date: 2024-12-04 00:15:19
Message-ID: CANWCAZbLdjnQg4ha3ajz_YfA5jf2V8w45x+=K0EHbymDP4HytQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 4, 2023 at 2:27 PM Xiang Gao <Xiang(dot)Gao(at)arm(dot)com> wrote:
>
> [v8 patch]

I have a couple quick thoughts on this:

1. I looked at a couple implementations of this idea, and found that
the constants used in the carryless multiply are tied to the length of
the blocks. With a lookup table we can do the 3-way algorithm on any
portion of a full block length, rather than immediately fall to doing
CRC serially. That would be faster on average. See for example
https://github.com/komrad36/CRC/tree/master , but I don't think we
actually have to fully unroll the loop like they do there.

2. With the above, we can use a larger full block size, and so on
average less time would be spent in the carryless multiply. With that,
we could possibly get away with an open coded loop in normal C rather
than a new intrinsic (also found in the above repo). That would be
more portable.

--
John Naylor
Amazon Web Services.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-04 00:42:03 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Michael Paquier 2024-12-03 23:36:56 Re: Sequence Access Methods, round two