From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-04-04 17:18:28 |
Message-ID: | 20240404171828.GA3866970@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 04, 2024 at 04:28:58PM +1300, David Rowley wrote:
> On Thu, 4 Apr 2024 at 11:50, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> If we can verify this approach won't cause segfaults and can stomach the
>> regression between 8 and 16 bytes, I'd happily pivot to this approach so
>> that we can avoid the function call dance that I have in v25.
>
> If we're worried about regressions with some narrow range of byte
> values, wouldn't it make more sense to compare that to cc4826dd5~1 at
> the latest rather than to some version that's already probably faster
> than PG16?
Good point. When compared with REL_16_STABLE, Ants's idea still wins:
bytes v25 v25+ants REL_16_STABLE
2 1108.205 1033.132 2039.342
4 1311.227 1289.373 3207.217
8 1927.954 2360.113 3200.238
16 2281.091 2365.408 4457.769
32 3856.992 2390.688 6206.689
64 3648.72 3242.498 9619.403
128 4108.549 3607.148 17912.081
256 4910.076 4496.852 33591.385
As before, with 2 and 4 bytes, HEAD is using the inlined approach, but
REL_16_STABLE is doing a function call. For 8 bytes, REL_16_STABLE is
doing a function call as well as a call to a function pointer. At 16
bytes, it's doing a function call and two calls to a function pointer.
With Ant's approach, both 8 and 16 bytes require a single call to a
function pointer, and of course we are using the AVX-512 implementation for
both.
I think this is sufficient to justify switching approaches.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | David E. Wheeler | 2024-04-04 17:20:11 | Re: RFC: Additional Directory for Extensions |
Previous Message | Jelte Fennema-Nio | 2024-04-04 17:16:50 | Re: WIP Incremental JSON Parser |