From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-05 23:16:33 |
Message-ID: | CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 6 Nov 2024 at 04:03, Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> Another option could be to use SIMD instructions to check multiple bytes
> is zero in a single operation. Maybe just an idea to keep in mind and experiment
> if we feel the need later on.
Could do. I just wrote it that way to give the compiler flexibility to
do SIMD implicitly. That seemed easier than messing around with SIMD
intrinsics. I guess the compiler won't use SIMD with the single
size_t-at-a-time version as it can't be certain it's ok to access the
memory beyond the first zero word. Because I wrote the "if" condition
using bitwise-OR, there's no boolean short-circuiting, so the compiler
sees it must be safe to access all the memory for the loop iteration.
If I use -march=native or -march=znver2 on my Zen2 machine, gcc does
use SIMD operators. Clang uses some 128-bit registers without
specifying -march:
drowley(at)amd3990x:~$ gcc -O2 allzeros.c -march=native -o allzeros &&
for i in {1..3}; do ./allzeros; done
char: done in 1940539 nanoseconds
size_t: done in 261731 nanoseconds (7.41425 times faster than char)
size_t * 4: done in 130415 nanoseconds (14.8797 times faster than char)
size_t * 8: done in 70031 nanoseconds (27.7097 times faster than char)
char: done in 3030132 nanoseconds
size_t: done in 477044 nanoseconds (6.35189 times faster than char)
size_t * 4: done in 123551 nanoseconds (24.5254 times faster than char)
size_t * 8: done in 68549 nanoseconds (44.2039 times faster than char)
char: done in 3214037 nanoseconds
size_t: done in 256901 nanoseconds (12.5108 times faster than char)
size_t * 4: done in 126017 nanoseconds (25.5048 times faster than char)
size_t * 8: done in 73167 nanoseconds (43.9274 times faster than char)
David
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2024-11-05 23:32:53 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |
Previous Message | Nikolay Samokhvalov | 2024-11-05 23:09:33 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |