From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-08 17:33:52 |
Message-ID: | Zy5LgNyHzOhnYTTy@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Fri, Nov 08, 2024 at 11:18:09PM +1300, David Rowley wrote:
> On Fri, 8 Nov 2024 at 18:34, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > I've done a round of comment and term cleanup for the whole patch,
> > while on it.
>
> I don't think "intrinsics" is the correct word to use here:
>
> + * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers
> + * to use SIMD intrinsics if available, up to the last aligned location
>
> and
>
> + * All comparisons are combined with a single OR operation, making it a
> + * good candidate for SIMD intrinsics, if available.
>
> an intrinsic function is a function built into the compiler that
> provides some lower-level functionality. e.g. __builtin_popcount().
Agree, replaced by "instructions" in v10 attached.
> I'm slightly worried due to the current rate we're receiving cleanup
> suggestions that someone might come along and think they'd be doing us
> a favour by submitting a patch to "fixup the inefficient bitwise-ORs
> and use boolean-OR".
That's a good point, better to be cautious here.
> Maybe a comment like the following might prevent
> that from happening.
>
> + * For performance reasons, we manually unroll this loop and purposefully
> + * use bitwise-ORs to combine each comparison. This prevents boolean
> + * short-circuiting and lets the compiler know that it's safe to access all 8
> + * elements regardless of the result of the other comparisons. This seems
> + * to be enough to coax a few compilers into using SIMD instructions.
Sounds good to me, used the above in v10.
v10 also splits the patch into 2 parts as suggested by Michael up-thread.
>
> > Btw, gcc seems a bit smarter than clang when it comes to optimizing
> > the code depending on the size of the structures. gcc gives up on
> > SIMD if it's sure that the structure on which we are going to use the
> > allzero check won't need it at all, and clang keeps it even if it does
> > not need it. That was interesting to see, while going through the
> > review..
>
> Can you share your test case for this?
+1
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v10-0001-Optimize-pg_memory_is_all_zeros.patch | text/x-diff | 3.4 KB |
v10-0002-Make-use-of-pg_memory_is_all_zeros-in-PageIsVeri.patch | text/x-diff | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-11-08 17:44:24 | Re: New GUC autovacuum_max_threshold ? |
Previous Message | Nathan Bossart | 2024-11-08 17:31:17 | Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C |