From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: add AVX2 support to simd.h |
Date: | 2024-03-19 16:30:33 |
Message-ID: | 20240319163033.GA952199@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 19, 2024 at 04:53:04PM +0700, John Naylor wrote:
> On Tue, Mar 19, 2024 at 10:16 AM Nathan Bossart
> <nathandbossart(at)gmail(dot)com> wrote:
>> 0002 does the opposite of this. That is, after we've completed as many
>> blocks as possible, we move the iterator variable back to "end -
>> block_size" and do one final iteration to cover all the remaining elements.
>
> Sounds similar in principle, but it looks really complicated. I don't
> think the additional loops and branches are a good way to go, either
> for readability or for branch prediction. My sketch has one branch for
> which loop to do, and then performs only one loop. Let's do the
> simplest thing that could work. (I think we might need a helper
> function to do the block, but the rest should be easy)
I tried to trim some of the branches, and came up with the attached patch.
I don't think this is exactly what you were suggesting, but I think it's
relatively close. My testing showed decent benefits from using 2 vectors
when there aren't enough elements for 4, so I've tried to keep that part
intact. This changes pg_lfind32() to something like:
if not many elements
process one by one
while enough elements for 4 registers remain
process with 4 registers
if no elements remain
return false
if more than 2-registers-worth of elements remain
do one iteration with 2 registers
do another iteration on last 2-registers-worth of elements
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
johns_suggestion.patch | text/x-diff | 2.1 KB |
image/jpeg | 19.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-03-19 16:40:25 | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Previous Message | Sushrut Shivaswamy | 2024-03-19 16:21:03 | Re: Read data from Postgres table pages |