From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2023-11-07 05:53:15 |
Message-ID: | 20231107055315.8e@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
> > On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
> >> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> >> > Like I said, I don't have any proposals yet, but assuming we do want to
> >> > support newer intrinsics, either open-coded or via auto-vectorization, I
> >> > suspect we'll need to gather consensus for a new policy/strategy.
> >>
> >> Yeah. The function-pointer solution kind of sucks, because for the
> >> sort of operation we're considering here, adding a call and return
> >> is probably order-of-100% overhead. Worse, it adds similar overhead
> >> for everyone who doesn't get the benefit of the optimization.
> >
> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
> > a function implementation based on the runtime CPU, without incurring function
> > pointer overhead. I would not attempt to use AVX512 on non-glibc systems, and
> > I would use ifunc to select the desired popcount implementation on glibc:
> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
>
> Thanks, that seems promising for the function pointer cases. I'll plan on
> trying to convert one of the existing ones to use it. BTW it looks like
> LLVM has something similar [0].
>
> IIUC this unfortunately wouldn't help for cases where we wanted to keep
> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
> unless we applied it to the calling functions, but that doesn't ѕound
> particularly maintainable.
Agreed, it doesn't solve inline cases. If the gains are big enough, we should
move toward packages containing N CPU-specialized copies of the postgres
binary, with bin/postgres just exec'ing the right one.
From | Date | Subject | |
---|---|---|---|
Next Message | John Morris | 2023-11-07 06:53:09 | Re: Where can I find the doxyfile? |
Previous Message | Kyotaro Horiguchi | 2023-11-07 05:35:14 | Re: Intermittent failure with t/003_logical_slots.pl test on windows |