Re: Popcount optimization using AVX512

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-07-31 00:49:59
Message-ID: 20240731004959.6ys24432n6xlgemk@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-07-30 16:32:07 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
> > Now, a reasonable counter-argument would be that only some of these macros are
> > defined for msvc ([1]). However, as it turns out, the test is broken
> > today, as msvc doesn't error out when using an intrinsic that's not
> > "available" by the target architecture, it seems to assume that the caller did
> > a cpuid check ahead of time.
> >
> >
> > Check out [2], it shows the various predefined macros for gcc, clang and msvc.
> >
> >
> > ISTM that the msvc checks for xsave/avx512 being broken should be an open
> > item?
>
> I'm not following this one. At the moment, we always do a runtime check
> for the AVX-512 stuff, so in the worst case we'd check CPUID at startup and
> set the function pointers appropriately, right? We could, of course, still
> fix it, though.

Ah, I somehow thought we'd avoid the runtime check in case we determine at
compile time we don't need any extra flags to enable the AVX512 stuff (similar
to how we deal with crc32). But it looks like that's not the case - which
seems pretty odd to me:

This turns something that can be a single instruction into an indirect
function call, even if we could know that it's guaranteed to be available for
the compilation target, due to -march=....

It's one thing for the avx512 path to have that overhead, but it's
particularly absurd for pg_popcount32/pg_popcount64, where

a) The function call overhead is a larger proportion of the cost.
b) the instruction is almost universally available, including in the
architecture baseline x86-64-v2, which several distros are using as the
x86-64 baseline.

Why are we actually checking for xsave? We're not using xsave itself and I
couldn't find a comment in 792752af4eb5 explaining what we're using it as a
proxy for? Is that just to know if _xgetbv() exists? Is it actually possible
that xsave isn't available when avx512 is?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-07-31 01:05:05 Re: Seq scan instead of index scan querying single row from primary key on large table
Previous Message Tom Lane 2024-07-31 00:35:15 Re: Do we still need parent column in pg_backend_memory_context?