From: | "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Subject: | RE: Popcount optimization using AVX512 |
Date: | 2024-04-18 18:12:22 |
Message-ID: | PH0PR11MB50007F79C92E3B0C7C1E6D6FF20E2@PH0PR11MB5000.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> It was brought to my attention [0] that we probably should be checking for the OSXSAVE bit instead of the XSAVE bit when determining whether there's support for the XGETBV instruction. IIUC that should indicate that both the OS and the processor have XGETBV support (not just the processor).
> I've attached a one-line patch to fix this.
> [0] https://github.com/pgvector/pgvector/pull/519#issuecomment-2062804463
Good find. I confirmed after speaking with an intel expert, and from the intel AVX-512 manual [0] section 14.3, which recommends to check bit27. From the manual:
"Prior to using Intel AVX, the application must identify that the operating system supports the XGETBV instruction,
the YMM register state, in addition to processor's support for YMM state management using XSAVE/XRSTOR and
AVX instructions. The following simplified sequence accomplishes both and is strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1).
2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2.)"
It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE check in the popcount code.
[0]: https://cdrdv2.intel.com/v1/dl/getContent/671200
- Akash Shankaran
From | Date | Subject | |
---|---|---|---|
Next Message | Kirk Wolak | 2024-04-18 18:28:08 | Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [ NOT Fixed ] |
Previous Message | Robert Haas | 2024-04-18 18:11:14 | Re: Add notes to pg_combinebackup docs |