Re: Popcount optimization using AVX512

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-07-31 20:52:54
Message-ID: 20240731205254.vfpap7uxwmebqeaf@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote:
> > On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
> >> My point is that _xgetbv() is made available by -mavx512vpopcntdq -mavx512bw
> >> alone, without needing -mxsave:
> >
> > Oh, I see. I'll work on a patch to remove that compiler check, then...
>
> As I started on this, I remembered why I needed it. The file
> pg_popcount_avx512_choose.c is compiled without the AVX-512 flags in order
> to avoid inadvertently issuing any AVX-512 instructions before determining
> we have support. If that's not a concern, we could still probably remove
> the XSAVE check.

I think it's a valid concern - but isn't that theoretically also an issue with
xsave itself? I guess practically the compiler won't do that, because there's
no practical reason to emit any instructions enabled by -mxsave (in contrast
to e.g. -mavx, which does trigger gcc to emit different instructions even for
basic math).

I think this is one of the few instances where msvc has the right approach -
if I use intrinsics to emit a specific instruction, the intrinsic should do
so, regardless of whether the compiler is allowed to do so on its own.

I think enabling options like these on a per-translation-unit basis isn't
really a scalable approach. To actually be safe there could only be a single
function in each TU and that function could only be called after a cpuid check
performed in a separate TU. That a) ends up pretty unreadable b) requires
functions to be implemented in .c files, which we really don't want for some
of this.

I think we'd be better off enabling architectural features on a per-function
basis, roughly like this:
https://godbolt.org/z/a4q9Gc6Ez

For posterity, in the unlikely case anybody reads this after godbolt shuts
down:

I'm thinking we'd have an attribute like this:

/*
* GCC like compilers don't support intrinsics without those intrinsics explicitly
* having been enabled. We can't just add these options more widely, as that allows the
* compiler to emit such instructions more widely, even if we gate reaching the code using
* intrinsics. So we just enable the relevant support for individual functions.
*
* In contrast to this, msvc allows use of intrinsics independent of what the compiler
* otherwise is allowed to emit.
*/
#ifdef __GNUC__
#define pg_enable_target(foo) __attribute__ ((__target__ (foo)))
#else
#define pg_enable_target(foo)
#endif

and then use that selectively for some functions:

/* FIXME: Should be gated by configure check of -mavx512vpopcntdq -mavx512bw support */
pg_enable_target("avx512vpopcntdq,avx512bw")
uint64_t
pg_popcount_avx512(const char *buf, int bytes)
...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2024-07-31 20:57:00 Re: [BUG?] check_exclusion_or_unique_constraint false negative
Previous Message Joe Conway 2024-07-31 20:42:51 Re: can we mark upper/lower/textlike functions leakproof?