Re: Detection of hadware feature => please do not use signal

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Bastien Roucariès <rouca(at)debian(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Detection of hadware feature => please do not use signal
Date: 2024-11-01 04:28:53
Message-ID: CA+hUKGLRXq0-s0n2=dy_A5iY-Omg628osyJEJMjPQ=6m566UgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It occurs to me to wonder whether the existing code works on Windows.
> Windows-on-ARM wasn't a thing we thought about in 2018, but it's
> a reasonable target now.

I looked into that[1] and decided that I was going to ignore it
completely, because:

* Windows defines a dummy SIGILL signal number as required by the C
standard, so stuff like this compiles, but
EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just
crash instead, but even if it were...
* we don't install a native signal handler anyway, just one of our
fake ones (that I would love to get rid of)
* there is also a native API to check CPU features, but I don't see
the point in even thinking about it for now, because...
* Windows 11 effectively requires ARMv8.1-A to boot, and that's what
comes installed on current machines you can buy (this is expressed
differently officially, by saying that LSE atomics are required and by
listing the supported CPUs by model not ISA, and the list of CPUs that
are no longer supported can also be found online, so it's fairly
clear; there are also fun investigative reports from the time when
RPI4s and other never-officially-supported systems could suddenly no
longer boot after some update, as they would croak on LSE
instructions)
* if someone is using Windows 10 for ARM (which I gather is getting
harder to obtain by now) on an old enough low power laptop that lacks
the instruction (and note that most ARMv8-A chips *did* have that
instruction as an optional extra anyway, they just didn't have LSE),
well then maybe it's an issue but I think for a new platform where we
are so short of developers that we haven't even got past other really
basic starter problems after a couple of years of talking about it, I
think we should focus only on current systems instead of allowing even
1 nanosecond to be wasted on retro-computing topics... and in any case
that hypothetical user is just about out of time because...
* Windows 10's announced EOL (no more updates/support, full screen
warning messages if past EOLs are anything to go by) coincides
approximately with PostgreSQL 18's expected release

(Huh, thinking about other synchronous signals, don't we also fail to
install a native SIGFPE handler on Windows, that would convert eg
div-by-zero in C code into an ereport on Unix?)

Anyway I don't mind getting rid of the SIGILL stuff as long as the new
coding is tidy and cross-platform enough, if people are insisting.
I'm not aware that there's anything technically wrong with it on any
Unix system, but I agree that it's not beautiful code. FreeBSD was my
motivation at the time, but it has since gained elf_aux_info(), and
OpenBSD too, and I can help write/test that part if we go that way.
Or maybe we could just take some fragments of OpenSSL or libzlma or
whatever code if the licensing aspect is OK.

As for the other ARM platforms in our universe:

Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its
auxv[], or even expose it to user space nicely, and those libraries
don't seem to know of another way. Hopefully NetBSD will align with
those other systems for portability's sake. The only other way I
could find in a quick googling session is /usr/sbin/cpuctl, root only,
no cigar (but a solid clue that the information is floating around
somewhere, it just depends where exactly the root-only fencing is
happening). Even if a way can be found, it'd be better to be able to
do it the same way as other systems that we are actually testing if at
all possible. Put that way, the build farm absence is a sort of vote
for just not even trying to detect it on NetBSD. A NetBSD user who
really wants to access the feature could still compile with
-march=<new thing>, supply a build farm animal and a patch, or
preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into
NetBSD. That'd be my vote if we switch to auxv probing on the other
systems.

Macs don't do ELF or auxv, but there's a sysctl. I think it must
always report true in practice. The first M1 was ARMv8.4-A. (There
are old non-computer Apple devices that used ARMv8-A still in the
wild, which is interesting to me only because it explains the recent
invention of -moutline-atomics, about which more soon hopefully, to
get faster lwlocks etc on our -march=armv8-a builds as shipped by
Debian et al.) So I think we could skip the palaver and just
hard-code the knowledge that Macs can do this stuff.

(Archeological note: the reason several systems have the same auxv[]
concept, ie a table of parameters that exec*() sets up alongside
argv[] and environ[] to communicate stuff about memory layout etc
primarily to libc, and the reason they even agree on some of the
parameter names, is that it came from SVR4 with ELF. I think Sun's
original version got AT_HWCAP from the ELF binary after selecting from
several object variants that were compiled to match various CPU
feature sets, as a way of shipping fat binaries that go faster on the
right hardware, while these modern systems are just passing on
whatever the CPU reports under the same hijacked name; it's related,
but different... Anyway they needed some way for the kernel to give
the features to user space on ARM, because its register that is
equivalent to x86's CPUID can't be accessed from user space's
privilege level and libc itself would like to be able to use some
fancy features. Even for non-ARM architectures it's nice not to have
to break out the assembler in user programs that want to do the same
sorts of tricks. Amazingly, illumos can apparently run on ARM now so
we might in theory encounter the old meaning of AT_HWCAP, but that's
quite a hypothetical unicorn so I'm filing the thought under
archeology and hiding it in parentheses.)

Note that Andres recently wondered out loud[2] if CRC32 might be
fundamentally the wrong tool for the job in a related thread, so
perhaps this will all become moot if someone does the research and
replaces it, but that's vapourware for now...

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-11-01 04:44:35 Re: Detection of hadware feature => please do not use signal
Previous Message PG Bug reporting form 2024-11-01 04:07:14 BUG #18682: Null grouping set with empty table returns a row contains null.