From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-13 00:25:37 |
Message-ID: | ZzPyAZmanAZDa8ir@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 12, 2024 at 10:56:20AM +0000, Bertrand Drouvot wrote:
> I think that depends of the memory area size. If the size is small enough then the
> byte per byte can be good enough.
>
> For example, with the allzeros_small.c attached:
>
> == with BLCKSZ 32
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
> byte per byte: done in 22528 nanoseconds
> size_t: done in 6949 nanoseconds (3.24191 times faster than byte per byte)
> SIMD v10: done in 7562 nanoseconds (2.97911 times faster than byte per byte)
> SIMD v11: done in 22096 nanoseconds (1.01955 times faster than byte per byte)
Some numbers from here, for the same test case at 32 bytes, with an
older version of gcc:
$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
$ gcc -march=native -O2 allzeros_small.c -o allzeros_small ;
./allzeros_small
byte per byte: done in 28193 nanoseconds
size_t: done in 4382 nanoseconds (6.43382 times faster than byte per byte)
SIMD v10: done in 8074 nanoseconds (3.49183 times faster than byte per byte)
SIMD v11: done in 26970 nanoseconds (1.04535 times faster than byte per byte)
> == with BLCKSZ 63
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
> byte per byte: done in 29246 nanoseconds
> size_t: done in 10555 nanoseconds (2.77082 times faster than byte per byte)
> SIMD v10: done in 11220 nanoseconds (2.6066 times faster than byte per byte)
> SIMD v11: done in 29126 nanoseconds (1.00412 times faster than byte per byte)
>
> Obviously v11 is about the same time as "byte per byte" but we can see that the
> size_t or v10 improvment is not that much for small size.
For 63 bytes:
byte per byte: done in 52611 nanoseconds
size_t: done in 21309 nanoseconds (2.46896 times faster than byte per byte)
SIMD v10: done in 16181 nanoseconds (3.25141 times faster than byte per byte)
SIMD v11: done in 51931 nanoseconds (1.01309 times faster than byte per byte)
> While for larger size:
>
> It's sensitive improvment.
Yep, for large sizes.
> Based on the above I've the feeling that doing byte per byte comparison for
> small size only (< 64b) is good enough. I'm not sure that adding extra complexity
> for small sizes is worth it.
Well, this is also telling us that we are at least 2 times faster if
we use allzeros_size_t() for areas smaller than 64 bytes rather than
allzeros_byte_per_byte() per your measurement, and I'm seeing even
faster numbers. So that seems worth the addition, especially for
smaller sizes where this is 6 times faster here.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-11-13 00:33:31 | Re: define pg_structiszero(addr, s, r) |
Previous Message | Jim Nasby | 2024-11-13 00:24:03 | Re: Vacuum statistics |