Re: define pg_structiszero(addr, s, r)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: define pg_structiszero(addr, s, r)
Date: 2024-11-13 00:25:37
Message-ID: ZzPyAZmanAZDa8ir@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2024 at 10:56:20AM +0000, Bertrand Drouvot wrote:
> I think that depends of the memory area size. If the size is small enough then the
> byte per byte can be good enough.
>
> For example, with the allzeros_small.c attached:
>
> == with BLCKSZ 32
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
> byte per byte: done in 22528 nanoseconds
> size_t: done in 6949 nanoseconds (3.24191 times faster than byte per byte)
> SIMD v10: done in 7562 nanoseconds (2.97911 times faster than byte per byte)
> SIMD v11: done in 22096 nanoseconds (1.01955 times faster than byte per byte)

Some numbers from here, for the same test case at 32 bytes, with an
older version of gcc:
$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
$ gcc -march=native -O2 allzeros_small.c -o allzeros_small ;
./allzeros_small
byte per byte: done in 28193 nanoseconds
size_t: done in 4382 nanoseconds (6.43382 times faster than byte per byte)
SIMD v10: done in 8074 nanoseconds (3.49183 times faster than byte per byte)
SIMD v11: done in 26970 nanoseconds (1.04535 times faster than byte per byte)

> == with BLCKSZ 63
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
> byte per byte: done in 29246 nanoseconds
> size_t: done in 10555 nanoseconds (2.77082 times faster than byte per byte)
> SIMD v10: done in 11220 nanoseconds (2.6066 times faster than byte per byte)
> SIMD v11: done in 29126 nanoseconds (1.00412 times faster than byte per byte)
>
> Obviously v11 is about the same time as "byte per byte" but we can see that the
> size_t or v10 improvment is not that much for small size.

For 63 bytes:
byte per byte: done in 52611 nanoseconds
size_t: done in 21309 nanoseconds (2.46896 times faster than byte per byte)
SIMD v10: done in 16181 nanoseconds (3.25141 times faster than byte per byte)
SIMD v11: done in 51931 nanoseconds (1.01309 times faster than byte per byte)

> While for larger size:
>
> It's sensitive improvment.

Yep, for large sizes.

> Based on the above I've the feeling that doing byte per byte comparison for
> small size only (< 64b) is good enough. I'm not sure that adding extra complexity
> for small sizes is worth it.

Well, this is also telling us that we are at least 2 times faster if
we use allzeros_size_t() for areas smaller than 64 bytes rather than
allzeros_byte_per_byte() per your measurement, and I'm seeing even
faster numbers. So that seems worth the addition, especially for
smaller sizes where this is 6 times faster here.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-13 00:33:31 Re: define pg_structiszero(addr, s, r)
Previous Message Jim Nasby 2024-11-13 00:24:03 Re: Vacuum statistics