Re: [PATCH] SVE popcount support

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
Cc: "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>
Subject: Re: [PATCH] SVE popcount support
Date: 2025-03-22 03:42:06
Message-ID: Z94xjuN9X7J9lSdT@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been preparing these for commit, and I've attached what I have so far.
A few notes:

* 0001 just renames the TRY_POPCNT_FAST macro to indicate that it's
x86_64-specific. IMO this is worth doing indpendent of this patch set,
but it's more important with the patch set since we need something
similar for Aarch64. I think we should also consider moving the x86_64
stuff to its own file (perhaps combining it with the AVX-512 stuff), but
that can probably wait until later.

* 0002 introduces the Neon implementation, which conveniently doesn't need
configure-time checks or function pointers. I noticed that some
compilers (e.g., Apple clang 16) compile in Neon instructions already,
but our hand-rolled implementation is better about instruction-level
parallelism and seems to still be quite a bit faster.

* 0003 introduces the SVE implementation. You'll notice I've moved all the
function pointer gymnastics into the pg_popcount_aarch64.c file, which is
where the Neon implementations live, too. I also tried to clean up the
configure checks a bit. I imagine it's possible to make them more
compact, but I felt that the enhanced readability was worth it.

* For both Neon and SVE, I do see improvements with looping over 4
registers at a time, so IMHO it's worth doing so even if it performs the
same as 2-register blocks on some hardware. I did add a 2-register block
in the Neon implementation for processing the tail because I was worried
about its performance on smaller buffers, but that part might get removed
if I can't measure any difference.

I'm planning to run several more benchmarks, but everything I've seen thus
far has looked pretty good.

--
nathan

Attachment Content-Type Size
v8-0001-Rename-TRY_POPCNT_FAST-to-POPCNT_X86_64.patch text/plain 4.4 KB
v8-0002-Neon-popcount-support.patch text/plain 9.3 KB
v8-0003-SVE-popcount-support.patch text/plain 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-03-22 05:07:50 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message David G. Johnston 2025-03-22 00:31:54 Re: Make COPY format extendable: Extract COPY TO format implementations