From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-04-18 19:53:46 |
Message-ID: | 20240418195346.GA3506520@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 18, 2024 at 06:12:22PM +0000, Shankaran, Akash wrote:
> Good find. I confirmed after speaking with an intel expert, and from the intel AVX-512 manual [0] section 14.3, which recommends to check bit27. From the manual:
>
> "Prior to using Intel AVX, the application must identify that the operating system supports the XGETBV instruction,
> the YMM register state, in addition to processor's support for YMM state management using XSAVE/XRSTOR and
> AVX instructions. The following simplified sequence accomplishes both and is strongly recommended.
> 1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1).
> 2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state are enabled by OS).
> 3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
> (Step 3 can be done in any order relative to 1 and 2.)"
Thanks for confirming. IIUC my patch should be sufficient, then.
> It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE check in the popcount code.
This seems to contradict the note about doing step 3 at any point, and
given step 1 is the OSXSAVE check, I'm not following what this means,
anyway.
I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6
instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower half
of some of the ZMM registers is stored in the SSE and AVX state [0]. I
don't know how likely it is that 0xe0 would succeed but 0xe6 wouldn't, but
we might as well make it correct.
[0] https://en.wikipedia.org/wiki/Control_register#cite_ref-23
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-04-18 19:59:01 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |
Previous Message | Jelte Fennema-Nio | 2024-04-18 19:34:07 | Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs |