Re: define pg_structiszero(addr, s, r)

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: define pg_structiszero(addr, s, r)
Date: 2024-11-08 17:33:52
Message-ID: Zy5LgNyHzOhnYTTy@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Nov 08, 2024 at 11:18:09PM +1300, David Rowley wrote:
> On Fri, 8 Nov 2024 at 18:34, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > I've done a round of comment and term cleanup for the whole patch,
> > while on it.
>
> I don't think "intrinsics" is the correct word to use here:
>
> + * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers
> + * to use SIMD intrinsics if available, up to the last aligned location
>
> and
>
> + * All comparisons are combined with a single OR operation, making it a
> + * good candidate for SIMD intrinsics, if available.
>
> an intrinsic function is a function built into the compiler that
> provides some lower-level functionality. e.g. __builtin_popcount().

Agree, replaced by "instructions" in v10 attached.

> I'm slightly worried due to the current rate we're receiving cleanup
> suggestions that someone might come along and think they'd be doing us
> a favour by submitting a patch to "fixup the inefficient bitwise-ORs
> and use boolean-OR".

That's a good point, better to be cautious here.

> Maybe a comment like the following might prevent
> that from happening.
>
> + * For performance reasons, we manually unroll this loop and purposefully
> + * use bitwise-ORs to combine each comparison. This prevents boolean
> + * short-circuiting and lets the compiler know that it's safe to access all 8
> + * elements regardless of the result of the other comparisons. This seems
> + * to be enough to coax a few compilers into using SIMD instructions.

Sounds good to me, used the above in v10.

v10 also splits the patch into 2 parts as suggested by Michael up-thread.

>
> > Btw, gcc seems a bit smarter than clang when it comes to optimizing
> > the code depending on the size of the structures. gcc gives up on
> > SIMD if it's sure that the structure on which we are going to use the
> > allzero check won't need it at all, and clang keeps it even if it does
> > not need it. That was interesting to see, while going through the
> > review..
>
> Can you share your test case for this?

+1

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v10-0001-Optimize-pg_memory_is_all_zeros.patch text/x-diff 3.4 KB
v10-0002-Make-use-of-pg_memory_is_all_zeros-in-PageIsVeri.patch text/x-diff 1.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-11-08 17:44:24 Re: New GUC autovacuum_max_threshold ?
Previous Message Nathan Bossart 2024-11-08 17:31:17 Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C