From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-08 10:18:09 |
Message-ID: | CAApHDvp2jx_=pFbgj-O1_ZmzP9WOZKfwLzZrS_=ZmbsqMQQ59g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 8 Nov 2024 at 18:34, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> I've done a round of comment and term cleanup for the whole patch,
> while on it.
I don't think "intrinsics" is the correct word to use here:
+ * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers
+ * to use SIMD intrinsics if available, up to the last aligned location
and
+ * All comparisons are combined with a single OR operation, making it a
+ * good candidate for SIMD intrinsics, if available.
an intrinsic function is a function built into the compiler that
provides some lower-level functionality. e.g. __builtin_popcount().
I'm slightly worried due to the current rate we're receiving cleanup
suggestions that someone might come along and think they'd be doing us
a favour by submitting a patch to "fixup the inefficient bitwise-ORs
and use boolean-OR". Maybe a comment like the following might prevent
that from happening.
+ * For performance reasons, we manually unroll this loop and purposefully
+ * use bitwise-ORs to combine each comparison. This prevents boolean
+ * short-circuiting and lets the compiler know that it's safe to access all 8
+ * elements regardless of the result of the other comparisons. This seems
+ * to be enough to coax a few compilers into using SIMD instructions.
> Btw, gcc seems a bit smarter than clang when it comes to optimizing
> the code depending on the size of the structures. gcc gives up on
> SIMD if it's sure that the structure on which we are going to use the
> allzero check won't need it at all, and clang keeps it even if it does
> not need it. That was interesting to see, while going through the
> review..
Can you share your test case for this? I tried with [1] and the
latest gcc does not seem to be smart enough to figure this out. I
tried adding some additional len checks that the compiler can use as a
cue and won't need to emit code for the checks providing the function
does get inlined. That was enough to get the compiler to not emit the
loops when they'll not be used. See the -DCHECK_LEN flag I'm passing
in the 2nd compiler window. I just don't know if putting something
like that into the code is a good idea as if the function wasn't
inlined for some reason, the extra len checks would have to be
compiled into the function.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-11-08 10:56:05 | Re: Fix small typo, use InvalidRelFileNumber instead of InvalidOid |
Previous Message | Amit Kapila | 2024-11-08 09:51:47 | Re: Commit Timestamp and LSN Inversion issue |