From: | "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
---|---|
To: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Rajat(dot)Ma(at)fujitsu(dot)com" <Rajat(dot)Ma(at)fujitsu(dot)com> |
Subject: | Re: Popcount optimization using SVE for ARM |
Date: | 2024-12-06 09:29:29 |
Message-ID: | OSZPR01MB849988C754B8EA76D1193D018B312@OSZPR01MB8499.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Kirill,
This work has been conducted independently and is not connected to https://www.postgresql.org/message-id/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com.
Our patch uses the existing infrastructure, i.e. the "choose_popcount_functions" method, to determine the correct popcount implementation based on the architecture, thereby requiring fewer code changes. The patch also includes implementations for popcount32, popcount64 and popcount masked. We'd be happy to discuss any potential overlaps and collaborate further to ensure the best solution is integrated.
Looking forward to your feedback!
Thanks & regards,
Susmitha Devanga.
________________________________
From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Sent: Friday, December 6, 2024 12:52
To: Susmitha, Devanga <Devanga(dot)Susmitha(at)fujitsu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org <pgsql-hackers(at)postgresql(dot)org>; Hajela, Ragesh <Ragesh(dot)Hajela(at)fujitsu(dot)com>; Bhattacharya, Chiranmoy <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>; M A, Rajat <Rajat(dot)Ma(at)fujitsu(dot)com>
Subject: Re: Popcount optimization using SVE for ARM
On Fri, 6 Dec 2024 at 10:54, Devanga(dot)Susmitha(at)fujitsu(dot)com<mailto:Devanga(dot)Susmitha(at)fujitsu(dot)com> <Devanga(dot)Susmitha(at)fujitsu(dot)com<mailto:Devanga(dot)Susmitha(at)fujitsu(dot)com>> wrote:
Hello, This email is to discuss the contribution of the speed-up popcount and popcount mask feature we have developed for the ARM architecture using SVE intrinsics.
The current method for popcount on ARM relies on compiler intrinsics or C code, which processes data in a scalar fashion, handling one integer at a time. By leveraging SVE intrinsics for popcount, the execution can process multiple integers simultaneously, depending on the vector length, thereby significantly enhancing the performance of the functionality.
We have designed this feature to ensure compatibility and robustness. It includes compile-time and runtime checks for SVE compatibility with both the compiler and hardware. If either check fails, the code falls back to the existing scalar implementation, ensuring fail-safe operation. Additionally, we leveraged the existing infrastructure to select between different popcount implementations, avoiding additional complexity.
Algorithm Overview:
1. For larger inputs, align the buffers to avoid double loads. For smaller inputs alignment is not necessary and might even degrade the performance.
2. Process the aligned buffer chunk by chunk till the last incomplete chunk.
3. Process the last incomplete chunk.
Our setup:
Machine: AWS EC2 c7g.8xlarge - 32vcpu, 64gb RAM
OS : Ubuntu 22.04.5 LTS
GCC: 11.4
Benchmark and Result:
We have used John Naylor's popcount-test-module [0] for benchmarking and observed a speed-up of more than 3x for larger buffers. Even for smaller inputs of size 8 and 32 bytes there aren't any performance degradations observed.
[cid:ii_1939ad8bcdacb971f161] [cid:ii_1939ad8bcdacb971f162]
We would like to contribute our above work so that it can be available for the community to utilize. To do so, we are following the procedure mentioned in Submitting a Patch - PostgreSQL wiki<https://wiki.postgresql.org/wiki/Submitting_a_Patch>. Please find the attachments for the patch and performance results.
Please let us know if you have any queries or suggestions.
Thanks & Regards,
Susmitha Devanga.
Hi! Is this patch somehow related to [0] ?
--
Best regards,
Kirill Reshke
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-12-06 09:31:11 | Re: Track the amount of time waiting due to cost_delay |
Previous Message | Vaijayanti Bharadwaj | 2024-12-06 09:20:01 | logical replication: patch to ensure timely cleanup of aborted transactions in ReorderBuffer |