From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2023-11-07 20:14:41 |
Message-ID: | 20231107201441.GA898662@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 06, 2023 at 09:53:15PM -0800, Noah Misch wrote:
> On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
>> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
>> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
>> > a function implementation based on the runtime CPU, without incurring function
>> > pointer overhead. I would not attempt to use AVX512 on non-glibc systems, and
>> > I would use ifunc to select the desired popcount implementation on glibc:
>> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
>>
>> Thanks, that seems promising for the function pointer cases. I'll plan on
>> trying to convert one of the existing ones to use it. BTW it looks like
>> LLVM has something similar [0].
>>
>> IIUC this unfortunately wouldn't help for cases where we wanted to keep
>> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
>> unless we applied it to the calling functions, but that doesn't ѕound
>> particularly maintainable.
>
> Agreed, it doesn't solve inline cases. If the gains are big enough, we should
> move toward packages containing N CPU-specialized copies of the postgres
> binary, with bin/postgres just exec'ing the right one.
I performed a quick test with ifunc on my x86 machine that ordinarily uses
the runtime checks for the CRC32C code, and I actually see a consistent
3.5% regression for pg_waldump -z on 100M 65-byte records. I've attached
the patch used for testing.
The multiple-copies-of-the-postgres-binary idea seems interesting. That's
probably not something that could be enabled by default, but perhaps we
could add support for a build option.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
ifunc_test.patch | text/x-diff | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-11-07 20:40:48 | Re: GUC names in messages |
Previous Message | Andres Freund | 2023-11-07 19:55:06 | Re: Add the ability to limit the amount of memory that can be allocated to backends. |