From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
Cc: | "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] SVE popcount support |
Date: | 2025-03-11 21:11:18 |
Message-ID: | Z9Cm9j-xLnbaHwxz@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 07, 2025 at 03:20:07AM +0000, Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com wrote:
> Sounds good. Let us know your findings.
Alright, here's what I saw on an R8g for drive_popcount(1000000, N):
8-byte words master v5-no-sve v5-sve v5-4reg
1 2.540 ms 2.170 ms 1.807 ms 2.178 ms
2 2.534 ms 2.180 ms 1.804 ms 2.167 ms
4 3.988 ms 3.240 ms 1.590 ms 2.879 ms
8 5.033 ms 4.672 ms 2.175 ms 2.525 ms
16 8.252 ms 10.916 ms 3.235 ms 3.588 ms
32 20.932 ms 22.883 ms 5.134 ms 5.395 ms
64 40.446 ms 45.668 ms 9.817 ms 9.285 ms
128 66.087 ms 91.386 ms 20.072 ms 17.175 ms
256 153.852 ms 182.594 ms 40.447 ms 32.212 ms
512 246.271 ms 300.941 ms 87.116 ms 60.729 ms
1024 487.180 ms 607.289 ms 180.574 ms 116.948 ms
2048 969.335 ms 1223.838 ms 363.595 ms 232.575 ms
4096 1934.646 ms 2472.154 ms 729.525 ms 459.495 ms
(Note that there should be no need to test anything smaller than 8 bytes
because we use the inline version in pg_bitutils.h in that case.)
v5-no-sve is the result of using a function pointer, but pointing to the
"slow" versions instead of the SVE version. v5-sve is the result of the
latest patch in this thread on a machine with SVE support, and v5-4reg is
the result of the latest patch in this thread modified to process 4
register's worth of data at a time.
The biggest takeaways for me are as follows:
* The 4-register version does show some nice improvements as the data
grows.
* Machines without SVE will likely incur a rather sizable regression from
the newly introduced function pointer.
For the latter point, I think we should consider trying to add a separate
Neon implementation that we use as a fallback for machines that don't have
SVE. My understanding is that Neon is virtually universally supported on
64-bit Arm gear, so that will not only help offset the function pointer
overhead but may even improve performance for a much wider set of machines.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-03-11 21:14:52 | Re: maintenance_work_mem = 64kB doesn't work for vacuum |
Previous Message | Álvaro Herrera | 2025-03-11 21:03:09 | Re: Non-text mode for pg_dumpall |