Re: [PATCH] SVE popcount support

From: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>
Subject: Re: [PATCH] SVE popcount support
Date: 2025-02-04 09:01:33
Message-ID: OSBPR01MB266403FD4C05DAB58EBBA82897EF2@OSBPR01MB2664.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The meson configure check seems to fail on my machine
> This test looks quite different than the autoconf one. Why is that? I
would expect them to be the same. And I think ideally the test would check
that all the intrinsics functions we need are available.

Fixed, both meson and autoconf have the same test program with all the intrinsics.
Meson should work now.

> + pgac_save_CFLAGS=$CFLAGS
> + CFLAGS="$pgac_save_CFLAGS $1"

> I don't see any extra compiler flag tests used, so we no longer need this,
right?

True, removed it.

> + if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then
> + pgac_arm_sve_popcnt_intrinsics=yes
> + fi

> I'm curious why this doesn't use Ac_cachevar like the examples above it
(e.g., PGAC_XSAVE_INTRINSICS).

Implemented using Ac_cachevar similar to PGAC_XSAVE_INTRINSICS.

> +#include "c.h"
> +#include "port/pg_bitutils.h"
> +
> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK

> nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above
the #include for pg_bitutils.h (but below the one for c.h).

Done.

> pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit
more complicated than this. Are we missing something here?

SVE is only available in aarch64, so we don't need to worry about aarch32. The latest patch
includes runtime checks for Linux and FreeBSD. For all other operating systems, false is
returned, because we are unable to verify the check.

> + /*
> + * For smaller inputs, aligning the buffer degrades the performance.
> + * Therefore, the buffers only when the input size is sufficiently large.
> + */

> Is the inverse true, i.e., does aligning the buffer improve performance for
> larger inputs? I'm also curious what level of performance degradation you
> were seeing.

Here is a comparison of all three cases. Alignment is marginally better for inputs
above 1024B, but the difference is small. Unaligned performs better for smaller inputs.
Aligned After 128B => the current implementation "if (aligned != buf && bytes > 4 * vec_len)"
Always Aligned => condition "bytes > 4 * vec_len" is removed.
Unaligned => the whole if block was removed

buf | Always Aligned | Aligned After 128B | Unaligned
--------+---------------+--------------------+------------
16 | 37.851 | 38.203 | 34.971
32 | 37.859 | 38.187 | 34.972
64 | 37.611 | 37.405 | 34.121
128 | 45.357 | 45.897 | 41.890
256 | 62.440 | 63.454 | 58.666
512 | 100.120 | 102.767 | 99.861
1024 | 159.574 | 158.594 | 164.975
2048 | 282.354 | 281.198 | 283.937
4096 | 532.038 | 531.068 | 533.699
8192 | 1038.973 | 1038.083 | 1039.206
16384 | 2028.604 | 2025.843 | 2033.940

---
Chiranmoy

Attachment Content-Type Size
v4-0001-SVE-support-for-popcount-and-popcount-masked.patch application/octet-stream 14.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shayon Mukherjee 2025-02-04 09:44:43 Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
Previous Message Masahiko Sawada 2025-02-04 08:59:11 Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary