Re: Popcount optimization using AVX512

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-07-31 01:20:34
Message-ID: ZqmRYh3iikm1Kh3D@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> Ah, I somehow thought we'd avoid the runtime check in case we determine at
> compile time we don't need any extra flags to enable the AVX512 stuff (similar
> to how we deal with crc32). But it looks like that's not the case - which
> seems pretty odd to me:
>
> This turns something that can be a single instruction into an indirect
> function call, even if we could know that it's guaranteed to be available for
> the compilation target, due to -march=....
>
> It's one thing for the avx512 path to have that overhead, but it's
> particularly absurd for pg_popcount32/pg_popcount64, where
>
> a) The function call overhead is a larger proportion of the cost.
> b) the instruction is almost universally available, including in the
> architecture baseline x86-64-v2, which several distros are using as the
> x86-64 baseline.

Yeah, pg_popcount32/64 have been doing this since v12 (02a6a54). Until v17
(cc4826d), pg_popcount() repeatedly calls these function pointers, too. I
think it'd be awesome if we could start requiring some of these "almost
universally available" instructions, but AFAICT that brings its own
complexity [0].

> Why are we actually checking for xsave? We're not using xsave itself and I
> couldn't find a comment in 792752af4eb5 explaining what we're using it as a
> proxy for? Is that just to know if _xgetbv() exists? Is it actually possible
> that xsave isn't available when avx512 is?

Yes, it's to verify we have XGETBV, which IIUC requires support from both
the processor and the OS (see 598e011 and upthread discussion). AFAIK the
way we are detecting AVX-512 support is quite literally by-the-book unless
I've gotten something wrong.

[0] https://postgr.es/m/ZmpG2ZzT30Q75BZO%40nathan

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-31 01:21:15 Re: Do we still need parent column in pg_backend_memory_context?
Previous Message Thomas Munro 2024-07-31 01:05:18 Re: Popcount optimization using AVX512