From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
Cc: | "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] SVE popcount support |
Date: | 2025-01-24 21:37:39 |
Message-ID: | Z5QII8eZnj_6P1Mh@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The meson configure check seems to fail on my machine:
error: too many arguments to function call, expected 0, have 1
10 | svuint64_t popcnt = svcntb(val);
| ~~~~~~ ^~~
error: returning '__SVInt64_t' from a function with incompatible result type 'int'
12 | return popcnt == 0;
| ^~~~~~~~~~~
The autoconf version seems to work okay, though.
+ pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS $1"
I don't see any extra compiler flag tests used, so we no longer need this,
right?
+ if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then
+ pgac_arm_sve_popcnt_intrinsics=yes
+ fi
I'm curious why this doesn't use Ac_cachevar like the examples above it
(e.g., PGAC_XSAVE_INTRINSICS).
+ prog = '''
+#include <arm_sve.h>
+
+#if defined(__has_attribute) && __has_attribute (target)
+ __attribute__((target("arch=armv8-a+sve")))
+#endif
+int main(void)
+{
+ const svuint64_t val = svdup_u64(0xFFFFFFFFFFFFFFFF);
+ svuint64_t popcnt = svcntb(val);
+ /* return computed value, to prevent the above being optimized away */
+ return popcnt == 0;
+}
+'''
This test looks quite different than the autoconf one. Why is that? I
would expect them to be the same. And I think ideally the test would check
that all the intrinsics functions we need are available.
+/*
+ * Returns true if the CPU supports the instructions required for the SVE
+ * pg_popcount() implementation.
+ */
+bool
+pg_popcount_sve_available(void)
+{
+ return getauxval(AT_HWCAP) & HWCAP_SVE;
+}
pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit
more complicated than this. Are we missing something here?
+ /*
+ * For smaller inputs, aligning the buffer degrades the performance.
+ * Therefore, the buffers only when the input size is sufficiently large.
+ */
Is the inverse true, i.e., does aligning the buffer improve performance for
larger inputs? I'm also curious what level of performance degradation you
were seeing.
+#include "c.h"
+#include "port/pg_bitutils.h"
+
+#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above
the #include for pg_bitutils.h (but below the one for c.h).
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-01-24 21:45:23 | Re: Windows: openssl & gssapi dislike each other |
Previous Message | Masahiko Sawada | 2025-01-24 21:25:31 | Re: vacuumdb changes for stats import/export |