From: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] SVE popcount support |
Date: | 2025-03-12 10:34:46 |
Message-ID: | OSBPR01MB2664639C190F433EFFF65ED397D02@OSBPR01MB2664.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 12, 2025 at 02:41:18AM +0000, nathandbossart(at)gmail(dot)com wrote:
> v5-no-sve is the result of using a function pointer, but pointing to the
> "slow" versions instead of the SVE version. v5-sve is the result of the
> latest patch in this thread on a machine with SVE support, and v5-4reg is
> the result of the latest patch in this thread modified to process 4
> register's worth of data at a time.
Nice, I wonder why I did not observe any performance gain in the 4reg
version. Did you modify the 4reg version code?
One possible explanation is that you used Graviton4 based instances
whereas I used Graviton3 instances.
> For the latter point, I think we should consider trying to add a separate
> Neon implementation that we use as a fallback for machines that don't have
> SVE. My understanding is that Neon is virtually universally supported on
> 64-bit Arm gear, so that will not only help offset the function pointer
> overhead but may even improve performance for a much wider set of machines.
I have added the NEON implementation in the latest patch.
Here are the numbers for drive_popcount(1000000, 1024) on m7g.8xlarge:
Scalar - 692ms
Neon - 298ms
SVE - 112ms
-Chiranmoy
Attachment | Content-Type | Size |
---|---|---|
v6-0001-SVE-and-NEON-support-for-popcount.patch | application/octet-stream | 17.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-03-12 10:40:44 | Re: Index AM API cleanup |
Previous Message | Amit Kapila | 2025-03-12 10:34:06 | Re: Add an option to skip loading missing publication to avoid logical replication failure |