From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-03-29 15:35:15 |
Message-ID: | 20240329153515.GA1046039@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 28, 2024 at 10:03:04PM +0000, Amonson, Paul D wrote:
>> * I think we need to verify there isn't a huge performance regression for
>> smaller arrays. IIUC those will still require an AVX512 instruction or
>> two as well as a function call, which might add some noticeable overhead.
>
> Not considering your changes, I had already tested small buffers. At less
> than 512 bytes there was no measurable regression (there was one extra
> condition check) and for 512+ bytes it moved from no regression to some
> gains between 512 and 4096 bytes. Assuming you introduced no extra
> function calls, it should be the same.
Cool. I think we should run the benchmarks again to be safe, though.
>> I forgot to mention that I also want to understand whether we can
>> actually assume availability of XGETBV when CPUID says we support
>> AVX512:
>
> You cannot assume as there are edge cases where AVX-512 was found on
> system one during compile but it's not actually available in a kernel on
> a second system at runtime despite the CPU actually having the hardware
> feature.
Yeah, I understand that much, but I want to know how portable the XGETBV
instruction is. Unless I can assume that all x86_64 systems and compilers
support that instruction, we might need an additional configure check
and/or CPUID check. It looks like MSVC has had support for the _xgetbv
intrinsic for quite a while, but I'm still researching the other cases.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-03-29 15:42:41 | Re: Popcount optimization using AVX512 |
Previous Message | Pavel Borisov | 2024-03-29 15:23:00 | Re: Table AM Interface Enhancements |