From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-03-28 21:51:36 |
Message-ID: | 20240328215136.GA918358@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 28, 2024 at 04:38:54PM -0500, Nathan Bossart wrote:
> Here is a v14 of the patch that I think is beginning to approach something
> committable. Besides general review and testing, there are two things that
> I'd like to bring up:
>
> * The latest patch set from Paul Amonson appeared to support MSVC in the
> meson build, but not the autoconf one. I don't have much expertise here,
> so the v14 patch doesn't have any autoconf/meson support for MSVC, which
> I thought might be okay for now. IIUC we assume that 64-bit/MSVC builds
> can always compile the x86_64 popcount code, but I don't know whether
> that's safe for AVX512.
>
> * I think we need to verify there isn't a huge performance regression for
> smaller arrays. IIUC those will still require an AVX512 instruction or
> two as well as a function call, which might add some noticeable overhead.
I forgot to mention that I also want to understand whether we can actually
assume availability of XGETBV when CPUID says we support AVX512:
> + /*
> + * We also need to check that the OS has enabled support for the ZMM
> + * registers.
> + */
> +#ifdef _MSC_VER
> + return (_xgetbv(0) & 0xe0) != 0;
> +#else
> + uint64 xcr = 0;
> + uint32 high;
> + uint32 low;
> +
> +__asm__ __volatile__(" xgetbv\n":"=a"(low), "=d"(high):"c"(xcr));
> + return (low & 0xe0) != 0;
> +#endif
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amonson, Paul D | 2024-03-28 22:03:04 | RE: Popcount optimization using AVX512 |
Previous Message | Tomas Vondra | 2024-03-28 21:48:28 | Re: pg_upgrade --copy-file-range |