From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] SVE popcount support |
Date: | 2025-03-26 21:44:24 |
Message-ID: | Z-R1OP2s3mYs_DIP@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've attached a new set of patches in which I've tried to address John's
feedback. I ran some new benchmarks with these patches. "M3" is an Apple
M3 (my laptop), "G3" is an r7g.4xlarge, and "G4" is an r8g.4xlarge. "no
SVE" means the patches are applied but the function pointer points to the
Neon implementation. "SVE" and "patched" mean all the patches are applied
with no changes.
8 byte words | M3 HEAD | M3 patched | G3 HEAD | G3 no SVE | G3 SVE | G4 HEAD | G4 no SVE | G4 SVE
--------------+---------+------------+---------+-----------+---------+---------+-----------+---------
1 | 3.6 | 3.0 | 3.1 | 2.9 | 3.1 | 2.5 | 2.2 | 1.8
2 | 6.4 | 4.4 | 3.1 | 3.0 | 3.1 | 2.5 | 2.5 | 2.0
3 | 7.3 | 6.9 | 3.5 | 3.5 | 3.1 | 3.3 | 3.2 | 2.0
4 | 8.0 | 3.8 | 4.0 | 2.7 | 4.7 | 3.6 | 2.2 | 2.7
5 | 9.4 | 5.5 | 4.6 | 2.8 | 4.6 | 3.9 | 2.5 | 2.7
6 | 7.9 | 5.0 | 5.1 | 3.5 | 4.7 | 4.3 | 3.1 | 3.4
7 | 10.2 | 7.4 | 5.9 | 4.0 | 4.7 | 4.7 | 3.6 | 3.4
8 | 12.0 | 5.4 | 6.5 | 4.0 | 5.9 | 5.0 | 3.2 | 2.5
9 | 11.7 | 6.5 | 7.2 | 4.3 | 5.9 | 5.4 | 3.6 | 2.5
10 | 12.5 | 5.4 | 8.0 | 4.8 | 5.9 | 6.2 | 3.9 | 3.1
11 | 14.0 | 8.6 | 8.5 | 5.5 | 5.9 | 6.1 | 5.0 | 3.1
12 | 13.1 | 5.7 | 9.1 | 5.1 | 7.4 | 6.4 | 3.9 | 3.6
13 | 12.1 | 6.8 | 9.8 | 5.4 | 7.3 | 6.8 | 4.3 | 3.6
14 | 16.4 | 7.8 | 10.4 | 5.9 | 7.4 | 7.2 | 4.7 | 4.4
15 | 17.4 | 8.0 | 11.1 | 6.6 | 7.4 | 7.5 | 5.7 | 4.4
16 | 15.5 | 5.7 | 11.8 | 5.7 | 4.7 | 7.9 | 5.0 | 3.5
32 | 26.0 | 16.2 | 22.7 | 10.3 | 6.2 | 16.8 | 8.4 | 5.2
64 | 38.5 | 20.3 | 42.7 | 20.1 | 9.3 | 31.8 | 15.4 | 8.8
128 | 75.1 | 35.7 | 86.1 | 35.0 | 15.4 | 80.2 | 28.6 | 16.3
256 | 117.7 | 51.8 | 179.6 | 68.2 | 27.8 | 154.0 | 55.7 | 30.9
512 | 198.5 | 93.1 | 329.3 | 134.4 | 52.4 | 246.5 | 110.2 | 59.4
1024 | 355.0 | 159.2 | 673.6 | 265.8 | 101.7 | 487.0 | 219.0 | 114.7
2048 | 669.5 | 288.8 | 1294.7 | 529.7 | 200.3 | 969.3 | 438.7 | 228.5
4096 | 1308.0 | 552.8 | 2784.3 | 1063.0 | 397.4 | 1934.5 | 874.4 | 455.9
IMHO these are acceptable results, at least for the use-cases I see in the
tree. We might be able to minimize the difference between the Neon and SVE
implementations on the low end with some additional code, but I'm really
not sure if it's worth the effort.
Barring feedback or objections, I'm planning to commit these on Friday.
--
nathan
Attachment | Content-Type | Size |
---|---|---|
v9-0001-Rename-TRY_POPCNT_FAST-to-TRY_POPCNT_X86_64.patch | text/plain | 4.7 KB |
v9-0002-Add-Neon-popcount-support.patch | text/plain | 10.2 KB |
v9-0003-Add-SVE-popcount-support.patch | text/plain | 16.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-03-26 21:52:10 | Re: AIO v2.5 |
Previous Message | Andres Freund | 2025-03-26 21:42:26 | Re: AIO v2.5 |