From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: CRC32C Parallel Computation Optimization on ARM |
Date: | 2023-10-24 21:09:54 |
Message-ID: | 20231024210954.GA918290@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 20, 2023 at 05:18:56PM +0900, Michael Paquier wrote:
> On Fri, Oct 20, 2023 at 07:08:58AM +0000, Xiang Gao wrote:
>> This patch uses a parallel computing optimization algorithm to
>> improve crc32c computing performance on ARM. The algorithm comes
>> from Intel whitepaper:
>> crc-iscsi-polynomial-crc32-instruction-paper. Input data is divided
>> into three equal-sized blocks.Three parallel blocks (crc0, crc1,
>> crc2) for 1024 Bytes.One Block: 42(BLK_LENGTH) * 8(step length:
>> crc32c_u64) bytes
>>
>> Crc32c unitest: https://gist.github.com/gaoxyt/138fd53ca1eead8102eeb9204067f7e4
>> Crc32c benchmark: https://gist.github.com/gaoxyt/4506c10fc06b3501445e32c4257113e9
>> It gets ~2x speedup compared to linear Arm crc32c instructions.
>
> Interesting. Could you attached to this thread the test files you
> used and the results obtained please? If this data gets deleted from
> github, then it would not be possible to refer back to what you did at
> the related benchmark results.
>
> Note that your patch is forgetting about meson; it just patches
> ./configure.
I'm able to reproduce the speedup with the provided benchmark on an Apple
M1 Pro (which appears to have the required instructions). There was almost
no change for the 512-byte case, but there was a ~60% speedup for the
4096-byte case.
However, I couldn't produce any noticeable speedup with Heikki's pg_waldump
benchmark [0]. I haven't had a chance to dig further, unfortunately.
Assuming I'm not doing something wrong, I don't think such a result should
necessarily disqualify this optimization, though.
[0] https://postgr.es/m/ec487192-f6aa-509a-cacb-6642dad14209%40iki.fi
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-10-24 21:18:06 | Re: CRC32C Parallel Computation Optimization on ARM |
Previous Message | Daniel Gustafsson | 2023-10-24 20:45:48 | Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc? |