Quick Links

Re: CRC32C Parallel Computation Optimization on ARM

From:	John Naylor <johncnaylorls(at)gmail(dot)com>
To:	Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: CRC32C Parallel Computation Optimization on ARM
Date:	2024-12-04 00:15:19
Message-ID:	CANWCAZbLdjnQg4ha3ajz_YfA5jf2V8w45x+=K0EHbymDP4HytQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Dec 4, 2023 at 2:27 PM Xiang Gao <Xiang(dot)Gao(at)arm(dot)com> wrote:
>
> [v8 patch]

I have a couple quick thoughts on this:

1. I looked at a couple implementations of this idea, and found that
the constants used in the carryless multiply are tied to the length of
the blocks. With a lookup table we can do the 3-way algorithm on any
portion of a full block length, rather than immediately fall to doing
CRC serially. That would be faster on average. See for example
https://github.com/komrad36/CRC/tree/master , but I don't think we
actually have to fully unroll the loop like they do there.

2. With the above, we can use a larger full block size, and so on
average less time would be spent in the carryless multiply. With that,
we could possibly get away with an open coded loop in normal C rather
than a new intrinsic (also found in the above repo). That would be
more portable.

--
John Naylor
Amazon Web Services.

In response to

RE: CRC32C Parallel Computation Optimization on ARM at 2023-12-04 07:27:01 from Xiang Gao

Responses

Re: CRC32C Parallel Computation Optimization on ARM at 2024-12-11 07:08:58 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2024-12-04 00:42:03	Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message	Michael Paquier	2024-12-03 23:36:56	Re: Sequence Access Methods, round two