From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-13 07:50:50 |
Message-ID: | ZzRaWs4KxgPnJGNO@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Nov 13, 2024 at 09:25:37AM +0900, Michael Paquier wrote:
> So that seems worth the addition, especially for
> smaller sizes where this is 6 times faster here.
So, something like v12 in pg_memory_is_all_zeros_v12() in allzeros_small.c
attached?
If so, that gives us:
== with BLCKSZ 32
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 22421 nanoseconds
size_t: done in 7269 nanoseconds (3.08447 times faster than byte per byte)
SIMD v10: done in 6349 nanoseconds (3.53142 times faster than byte per byte)
SIMD v11: done in 22080 nanoseconds (1.01544 times faster than byte per byte)
SIMD v12: done in 5595 nanoseconds (4.00733 times faster than byte per byte)
== with BLCKSZ 63
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 29525 nanoseconds
size_t: done in 11232 nanoseconds (2.62865 times faster than byte per byte)
SIMD v10: done in 10828 nanoseconds (2.72673 times faster than byte per byte)
SIMD v11: done in 42056 nanoseconds (0.70204 times faster than byte per byte)
SIMD v12: done in 10468 nanoseconds (2.8205 times faster than byte per byte)
== with BLCKSZ 256
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 120483 nanoseconds
size_t: done in 23098 nanoseconds (5.21617 times faster than byte per byte)
SIMD v10: done in 6737 nanoseconds (17.8838 times faster than byte per byte)
SIMD v11: done in 6621 nanoseconds (18.1971 times faster than byte per byte)
SIMD v12: done in 6519 nanoseconds (18.4818 times faster than byte per byte)
== with BLCKSZ 8192
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 3393459 nanoseconds
size_t: done in 707304 nanoseconds (4.79774 times faster than byte per byte)
SIMD v10: done in 233559 nanoseconds (14.5293 times faster than byte per byte)
SIMD v11: done in 225951 nanoseconds (15.0186 times faster than byte per byte)
SIMD v12: done in 225766 nanoseconds (15.0309 times faster than byte per byte)
That's better for small size but given the extra len checks that
has been added I think we're back to David's point in [1]: What if the function
is not inlined for some reason?
So, out of curiosity, let's see what happens if not inlined in [2] (see the
-O2 -DNOT_INLINE compiler window):
- if a[3]: it looks like gcc is smart enough to create an optimized version
for that size using constant propagation
- if a[63]: Same as above
- if a[256]: Same as above
- if a[8192]: Same as above
I did a quick check with clang and it looks like it is not as smart as gcc
for the non inline case.
Anyway it's not like we have the choice: we need (at least) one len check for
safety reason (to not crash or read invalid data).
So, I'd vote for pg_memory_is_all_zeros_v12() then, thoughts?
[1]: https://www.postgresql.org/message-id/CAApHDvp2jx_%3DpFbgj-O1_ZmzP9WOZKfwLzZrS_%3DZmbsqMQQ59g%40mail.gmail.com
[2]: https://godbolt.org/z/8s44GKqcc
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
allzeros_small.c | text/x-csrc | 10.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ronan Dunklau | 2024-11-13 08:15:08 | Re: Converting contrib SQL functions to new style |
Previous Message | Rahila Syed | 2024-11-13 07:30:52 | Re: Enhancing Memory Context Statistics Reporting |