From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: use ARM intrinsics in pg_lfind32() where available |
Date: | 2022-08-27 23:00:49 |
Message-ID: | 20220827230049.GA111000@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Aug 28, 2022 at 10:39:09AM +1200, Thomas Munro wrote:
> On Sun, Aug 28, 2022 at 10:12 AM Nathan Bossart
> <nathandbossart(at)gmail(dot)com> wrote:
>> Yup. The problem is that AFAICT there's no equivalent to
>> _mm_movemask_epi8() on aarch64, so you end up with something like
>>
>> vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0
>>
>> But for pg_lfind32(), we really just want to know if any lane is set, which
>> only requires a call to vmaxvq_u32(). I haven't had a chance to look too
>> closely, but my guess is that this ultimately results in an extra AND
>> operation in the aarch64 path, so maybe it doesn't impact performance too
>> much. The other option would be to open-code the intrinsic function calls
>> into pg_lfind.h. I'm trying to avoid the latter, but maybe it's the right
>> thing to do for now... What do you think?
>
> Ahh, this gives me a flashback to John's UTF-8 validation thread[1]
> (the beginner NEON hackery in there was just a learning exercise,
> sadly not followed up with real patches...). He had
> _mm_movemask_epi8(v) != 0 which I first translated to
> to_bool(bitwise_and(v, vmovq_n_u8(0x80))) and he pointed out that
> vmaxvq_u8(v) > 0x7F has the right effect without the and.
I knew there had to be an easier way! I'll give this a try. Thanks.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2022-08-27 23:03:04 | Re: Backends stunk in wait event IPC/MessageQueueInternal |
Previous Message | Thomas Munro | 2022-08-27 22:39:09 | Re: use ARM intrinsics in pg_lfind32() where available |