Re: use ARM intrinsics in pg_lfind32() where available

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: use ARM intrinsics in pg_lfind32() where available
Date: 2022-08-27 21:18:34
Message-ID: 2494641.1661635114@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I spent a bit more time researching the portability implications of
this patch. I think that we should check __ARM_NEON before #including
<arm_neon.h>; there is authoritative documentation out there telling
you to, eg [1], and I can see no upside at all to not checking.
We cannot check *only* __ARM_NEON, though. I found it to get defined
by clang 8.0.0 in my Fedora 30 32-bit image, although that does not
provide all the instructions we want (I see "undefined function"
complaints for vmaxvq_u8 etc if I try to make it use the patch).
Looking into that installation's <arm_neon.h>, those functions are
defined conditionally if "__ARM_FP & 2", which is kind of interesting
--- per [1], that bit indicates support for 16-bit floating point,
which seems a mite unrelated.

It appears from the info at [2] that there are at least some 32-bit
ARM platforms that set that bit, implying (if the clang authors are
well informed) that they have the instructions we want. But we
could not realistically make 32-bit builds that try to use those
instructions without a run-time test; such a build would fail for
too many people. I doubt that a run-time test is worth the trouble,
so I concur with the idea of selecting NEON on aarch64 only and hoping
to thereby avoid a runtime test.

In short, I think the critical part of 0002 needs to look more like
this:

+#elif defined(__aarch64__) && defined(__ARM_NEON)
+/*
+ * We use the Neon instructions if the compiler provides access to them
+ * (as indicated by __ARM_NEON) and we are on aarch64. While Neon support is
+ * technically optional for aarch64, it appears that all available 64-bit
+ * hardware does have it. Neon exists in some 32-bit hardware too, but
+ * we could not realistically use it there without a run-time check,
+ * which seems not worth the trouble for now.
+ */
+#include <arm_neon.h>
+#define USE_NEON
...

Coding like this appears to work on both my Apple M1 and my Raspberry
Pi, with several different OSes checked on the latter.

regards, tom lane

[1] https://developer.arm.com/documentation/101754/0618/armclang-Reference/Other-Compiler-specific-Features/Predefined-macros
[2] http://micro-os-plus.github.io/develop/predefined-macros/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-08-27 22:12:34 Re: use ARM intrinsics in pg_lfind32() where available
Previous Message Ibrar Ahmed 2022-08-27 19:28:26 [Commitfest 2022-09] Begins This Thursday