From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Cc: | "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-04-18 21:01:58 |
Message-ID: | 20240418210158.GA3776258@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 18, 2024 at 08:24:03PM +0000, Devulapalli, Raghuveer wrote:
>> This seems to contradict the note about doing step 3 at any point, and
>> given step 1 is the OSXSAVE check, I'm not following what this means,
>> anyway.
>
> It is recommended that you run the xgetbv code before you check for cpu
> features avx512-popcnt and avx512-bw. The way it is written now is the
> opposite order. I would also recommend splitting the cpuid feature check
> for avx512popcnt/avx512bw and xgetbv section into separate functions to
> make them modular. Something like:
>
> static inline
> int check_os_avx512_support(void)
> {
> // (1) run cpuid leaf 1 to check for xgetbv instruction support:
> unsigned int exx[4] = {0, 0, 0, 0};
> __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
> if ((exx[2] & (1 << 27)) == 0) /* xsave */
> return false;
>
> /* Does XGETBV say the ZMM/YMM/XMM registers are enabled? */
> return (_xgetbv(0) & 0xe0) == 0xe0;
> }
>
>> I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6
>> instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower
>> half of some of the ZMM registers is stored in the SSE and AVX state
>> [0]. I don't know how likely it is that 0xe0 would succeed but 0xe6
>> wouldn't, but we might as well make it correct.
>
> This is correct. It needs to check all the 3 bits (XMM/YMM and ZMM). The
> way it is written is now is in-correct.
Thanks for the feedback. I've attached an updated patch.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v2-0001-osxsave.patch | text/x-diff | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-04-18 21:19:24 | Re: improve performance of pg_dump --binary-upgrade |
Previous Message | Thomas Munro | 2024-04-18 21:00:05 | Re: Cannot find a working 64-bit integer type on Illumos |