Re: Popcount optimization using AVX512

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-07-31 01:46:51
Message-ID: 20240731014651.byn3exbu6n5v2yqa@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> > Ah, I somehow thought we'd avoid the runtime check in case we determine at
> > compile time we don't need any extra flags to enable the AVX512 stuff (similar
> > to how we deal with crc32). But it looks like that's not the case - which
> > seems pretty odd to me:
> >
> > This turns something that can be a single instruction into an indirect
> > function call, even if we could know that it's guaranteed to be available for
> > the compilation target, due to -march=....
> >
> > It's one thing for the avx512 path to have that overhead, but it's
> > particularly absurd for pg_popcount32/pg_popcount64, where
> >
> > a) The function call overhead is a larger proportion of the cost.
> > b) the instruction is almost universally available, including in the
> > architecture baseline x86-64-v2, which several distros are using as the
> > x86-64 baseline.
>
> Yeah, pg_popcount32/64 have been doing this since v12 (02a6a54). Until v17
> (cc4826d), pg_popcount() repeatedly calls these function pointers, too. I
> think it'd be awesome if we could start requiring some of these "almost
> universally available" instructions, but AFAICT that brings its own
> complexity [0].

I'll respond there...

> > Why are we actually checking for xsave? We're not using xsave itself and I
> > couldn't find a comment in 792752af4eb5 explaining what we're using it as a
> > proxy for? Is that just to know if _xgetbv() exists? Is it actually possible
> > that xsave isn't available when avx512 is?
>
> Yes, it's to verify we have XGETBV, which IIUC requires support from both
> the processor and the OS (see 598e011 and upthread discussion). AFAIK the
> way we are detecting AVX-512 support is quite literally by-the-book unless
> I've gotten something wrong.

I'm basically wondering whether we need to check for compiler (not OS support)
support for xsave if we also check for -mavx512vpopcntdq -mavx512bw
support. Afaict the latter implies support for xsave.

andres(at)alap6:~$ echo|gcc -c - -march=x86-64 -xc -dM -E - -o -|grep '__XSAVE__'
andres(at)alap6:~$ echo|gcc -c - -march=x86-64 -mavx512vpopcntdq -mavx512bw -xc -dM -E - -o -|grep '__XSAVE__'
#define __XSAVE__ 1
#define __XSAVE__ 1

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-31 02:01:31 Re: Popcount optimization using AVX512
Previous Message Tom Lane 2024-07-31 01:27:06 Re: Do we still need parent column in pg_backend_memory_context?