From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: add AVX2 support to simd.h |
Date: | 2024-03-21 17:09:44 |
Message-ID: | 20240321170944.GA1767527@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> I'm much happier about v5-0001. With a small tweak it would match what
> I had in mind:
>
> + if (nelem < nelem_per_iteration)
> + goto one_by_one;
>
> If this were "<=" then the for long arrays we could assume there is
> always more than one block, and wouldn't need to check if any elements
> remain -- first block, then a single loop and it's done.
>
> The loop could also then be a "do while" since it doesn't have to
> check the exit condition up front.
Good idea. That causes us to re-check all of the tail elements when the
number of elements is evenly divisible by nelem_per_iteration, but that
might be worth the trade-off.
> Yes, that spike is weird, because it seems super-linear. However, the
> more interesting question for me is: AVX2 isn't really buying much for
> the numbers covered in this test. Between 32 and 48 elements, and
> between 64 and 80, it's indistinguishable from SSE2. The jumps to the
> next shelf are postponed, but the jumps are just as high. From earlier
> system benchmarks, I recall it eventually wins out with hundreds of
> elements, right? Is that still true?
It does still eventually win, although not nearly to the same extent as
before. I extended the benchmark a bit to show this. I wouldn't be
devastated if we only got 0001 committed for v17, given these results.
> Further, now that the algorithm is more SIMD-appropriate, I wonder
> what doing 4 registers at a time is actually buying us for either SSE2
> or AVX2. It might just be a matter of scale, but that would be good to
> understand.
I'll follow up with these numbers shortly.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v6-0001-pg_lfind32-add-overlap-code-for-remaining-element.patch | text/x-diff | 3.8 KB |
v6-0002-Add-support-for-AVX2-in-simd.h.patch | text/x-diff | 4.8 KB |
image/jpeg | 23.3 KB | |
image/jpeg | 20.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-03-21 17:12:22 | Re: add AVX2 support to simd.h |
Previous Message | Robert Haas | 2024-03-21 17:09:24 | Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers |