Quick Links

Re: use ARM intrinsics in pg_lfind32() where available

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: use ARM intrinsics in pg_lfind32() where available
Date:	2022-08-29 05:44:49
Message-ID:	20220829054449.GA399604@nathanxps13
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Aug 29, 2022 at 11:25:50AM +0700, John Naylor wrote:
> + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
> + uint32 nelem_per_iteration = 4 * nelem_per_vector;
>
> Using local #defines would be my style. I don't have a reason to
> object to this way, but adding const makes these vars more clear.

I added const.

> Speaking of const:
>
> - const __m128i tmp1 = _mm_or_si128(result1, result2);
> - const __m128i tmp2 = _mm_or_si128(result3, result4);
> - const __m128i result = _mm_or_si128(tmp1, tmp2);
> + tmp1 = vector32_or(result1, result2);
> + tmp2 = vector32_or(result3, result4);
> + result = vector32_or(tmp1, tmp2);
>
> Any reason to throw away the const declarations?

The only reason is because I had to move the declarations to before the
vector32_load() calls.

> +static inline bool
> +vector32_is_highbit_set(const Vector32 v)
> +{
> +#ifdef USE_SSE2
> + return (_mm_movemask_epi8(v) & 0x8888) != 0;
> +#endif
> +}
>
> I'm not sure why we need this function -- AFAICS it just adds more
> work on x86 for zero benefit. For our present application, can we just
> cast to Vector8 (for Arm's sake) and call the 8-bit version?

Good idea.

> - * operations using bitwise operations on unsigned integers.
> + * operations using bitwise operations on unsigned integers. Note that many
> + * of the functions in this file presently do not have non-SIMD
> + * implementations.
>
> It's unclear to the reader whether this is a matter of 'round-to-it's.
> I'd like to document what I asserted in this thread, that it's likely
> not worthwhile to do anything with a uint64 representing two 32-bit
> ints. (It *is* demonstrably worth it for handling 8 byte-values at a
> time)

Done.

> * Use saturating subtraction to find bytes <= c, which will present as
> - * NUL bytes in 'sub'.
> + * NUL bytes.
>
> I'd like to to point out that the reason to do it this way is to
> workaround SIMD architectures frequent lack of unsigned comparison.

Done.

> + * Return the result of subtracting the respective elements of the input
> + * vectors using saturation.
>
> I wonder if we should explain briefly what saturating arithmetic is. I
> had never encountered it outside of a SIMD programming context.

Done.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v6-0001-abstract-architecture-specific-implementation-det.patch	text/x-diff	9.1 KB
v6-0002-use-ARM-Advanced-SIMD-intrinsic-functions-where-a.patch	text/x-diff	4.0 KB

In response to

Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-29 04:25:50 from John Naylor

Responses

Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-29 07:51:03 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2022-08-29 06:02:01	Support tls-exporter as channel binding for TLSv1.3
Previous Message	David Rowley	2022-08-29 05:26:29	Re: Reducing the chunk header sizes on all memory context types