Re: define pg_structiszero(addr, s, r)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: define pg_structiszero(addr, s, r)
Date: 2024-11-05 23:16:33
Message-ID: CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 6 Nov 2024 at 04:03, Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> Another option could be to use SIMD instructions to check multiple bytes
> is zero in a single operation. Maybe just an idea to keep in mind and experiment
> if we feel the need later on.

Could do. I just wrote it that way to give the compiler flexibility to
do SIMD implicitly. That seemed easier than messing around with SIMD
intrinsics. I guess the compiler won't use SIMD with the single
size_t-at-a-time version as it can't be certain it's ok to access the
memory beyond the first zero word. Because I wrote the "if" condition
using bitwise-OR, there's no boolean short-circuiting, so the compiler
sees it must be safe to access all the memory for the loop iteration.

If I use -march=native or -march=znver2 on my Zen2 machine, gcc does
use SIMD operators. Clang uses some 128-bit registers without
specifying -march:

drowley(at)amd3990x:~$ gcc -O2 allzeros.c -march=native -o allzeros &&
for i in {1..3}; do ./allzeros; done
char: done in 1940539 nanoseconds
size_t: done in 261731 nanoseconds (7.41425 times faster than char)
size_t * 4: done in 130415 nanoseconds (14.8797 times faster than char)
size_t * 8: done in 70031 nanoseconds (27.7097 times faster than char)
char: done in 3030132 nanoseconds
size_t: done in 477044 nanoseconds (6.35189 times faster than char)
size_t * 4: done in 123551 nanoseconds (24.5254 times faster than char)
size_t * 8: done in 68549 nanoseconds (44.2039 times faster than char)
char: done in 3214037 nanoseconds
size_t: done in 256901 nanoseconds (12.5108 times faster than char)
size_t * 4: done in 126017 nanoseconds (25.5048 times faster than char)
size_t * 8: done in 73167 nanoseconds (43.9274 times faster than char)

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2024-11-05 23:32:53 Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
Previous Message Nikolay Samokhvalov 2024-11-05 23:09:33 Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE