Re: define pg_structiszero(addr, s, r)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: define pg_structiszero(addr, s, r)
Date: 2024-11-12 16:32:36
Message-ID: CAEudQAqLKUaaaQMtJEUmQaCKxB=sU_2CcN=s7TeWAtRCx4oaPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em ter., 12 de nov. de 2024 às 07:56, Bertrand Drouvot <
bertranddrouvot(dot)pg(at)gmail(dot)com> escreveu:

> Hi,
>
> On Tue, Nov 12, 2024 at 03:56:13PM +0900, Michael Paquier wrote:
> > On Tue, Nov 12, 2024 at 06:09:04AM +0000, Bertrand Drouvot wrote:
> > > I think that the 64b len check done in v11 is mandatory for safety
> reasons.
> > >
> > > The loop above reads 64 bytes at once, so would read beyond the memory
> area bounds
> > > if len < 64: That could cause crash or read invalid data.
> >
> > Sorry, I was not following your argument. You're right that we need
> > something else here. However..
> >
> > + /*
> > + * For len < 64, compare byte per byte to ensure we'll not read
> beyond the
> > + * memory area.
> > + */
> > + if (len < sizeof(size_t) * 8)
> > + {
> > + while (p < end)
> > + {
> > + if (*p++ != 0)
> > + return false;
> > + }
> > + return true;
> > + }
> > +
> > + /* Compare bytes until the pointer "p" is aligned */
> > + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
> > + {
> > + if (p == end)
> > + return true;
> > +
> > + if (*p++ != 0)
> > + return false;
> > + }
> > +
> >
> > Still, this is not optimal, based on what's been discussed upthread.
> > The byte-per-byte check is more expensive than the size_t check,
>
> I think that depends of the memory area size. If the size is small enough
> then the
> byte per byte can be good enough.
>
> For example, with the allzeros_small.c attached:
>
It seems to me that it is enough to protect the SIMD loop when the size is
smaller.

if (len > sizeof(size_t) * 8)
{
for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) *
8)
{
if ((((size_t *) p)[0] != 0) | (((size_t *) p)[1] != 0) |
(((size_t *) p)[2] != 0) | (((size_t *) p)[3] != 0) |
(((size_t *) p)[4] != 0) | (((size_t *) p)[5] != 0) |
(((size_t *) p)[6] != 0) | (((size_t *) p)[7] != 0))
return false;
}
}

See v1_allzeros_small.c attached.

>
> == with BLCKSZ 32
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c
> -o allzeros_small ; ./allzeros_small
> byte per byte: done in 22528 nanoseconds
> size_t: done in 6949 nanoseconds (3.24191 times faster than byte per byte)
> SIMD v10: done in 7562 nanoseconds (2.97911 times faster than byte per
> byte)
> SIMD v11: done in 22096 nanoseconds (1.01955 times faster than byte per
> byte)
>
gcc -march=native -O2 v1_allzeros_small.c -o v1_allzeros_small ;
./v1_allzeros_small
byte per byte: done in 97345 nanoseconds
size_t: done in 20305 nanoseconds (4.79414 times faster than byte per byte)
SIMD v10: done in 25813 nanoseconds (3.77116 times faster than byte per
byte)
SIMD v11: done in 24580 nanoseconds (3.96033 times faster than byte per
byte)

>
> == with BLCKSZ 63
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c
> -o allzeros_small ; ./allzeros_small
> byte per byte: done in 29246 nanoseconds
> size_t: done in 10555 nanoseconds (2.77082 times faster than byte per byte)
> SIMD v10: done in 11220 nanoseconds (2.6066 times faster than byte per
> byte)
> SIMD v11: done in 29126 nanoseconds (1.00412 times faster than byte per
> byte)
>
gcc -march=native -O2 v1_allzeros_small.c -o v1_allzeros_small ;
./v1_allzeros_small
byte per byte: done in 57763 nanoseconds
size_t: done in 19760 nanoseconds (2.92323 times faster than byte per byte)
SIMD v10: done in 24088 nanoseconds (2.398 times faster than byte per byte)
SIMD v11: done in 20151 nanoseconds (2.86651 times faster than byte per
byte)

> Obviously v11 is about the same time as "byte per byte" but we can see
> that the
> size_t or v10 improvment is not that much for small size.
>
> While for larger size:
>
> == with BLCKSZ 256
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c
> -o allzeros_small ; ./allzeros_small
> byte per byte: done in 102703 nanoseconds
> size_t: done in 15381 nanoseconds (6.67726 times faster than byte per byte)
> SIMD v10: done in 7241 nanoseconds (14.1835 times faster than byte per
> byte)
> SIMD v11: done in 7899 nanoseconds (13.002 times faster than byte per byte)
>
gcc -march=native -O2 v1_allzeros_small.c -o v1_allzeros_small ;
./v1_allzeros_small
byte per byte: done in 213276 nanoseconds
size_t: done in 45288 nanoseconds (4.70933 times faster than byte per byte)
SIMD v10: done in 15840 nanoseconds (13.4644 times faster than byte per
byte)
SIMD v11: done in 15773 nanoseconds (13.5216 times faster than byte per
byte)

> == with BLCKSZ 8192
>
> $ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c
> -o allzeros_small ; ./allzeros_small
> byte per byte: done in 2993458 nanoseconds
> size_t: done in 436650 nanoseconds (6.85551 times faster than byte per
> byte)
> SIMD v10: done in 136413 nanoseconds (21.9441 times faster than byte per
> byte)
> SIMD v11: done in 155474 nanoseconds (19.2538 times faster than byte per
> byte)
>
gcc -march=native -O2 v1_allzeros_small.c -o v1_allzeros_small ;
./v1_allzeros_small
byte per byte: done in 10358761 nanoseconds
size_t: done in 864673 nanoseconds (11.98 times faster than byte per byte)
SIMD v10: done in 342880 nanoseconds (30.211 times faster than byte per
byte)
SIMD v11: done in 341332 nanoseconds (30.3481 times faster than byte per
byte)

best regards,
Ranier Vilela

Attachment Content-Type Size
v1_allzeros_small.c text/x-csrc 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-11-12 16:34:36 Re: Extract constants from EXECUTE queries
Previous Message Peter Eisentraut 2024-11-12 16:17:16 Re: Virtual generated columns