From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-12 10:56:20 |
Message-ID: | ZzM0VNK161K95ZHn@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Tue, Nov 12, 2024 at 03:56:13PM +0900, Michael Paquier wrote:
> On Tue, Nov 12, 2024 at 06:09:04AM +0000, Bertrand Drouvot wrote:
> > I think that the 64b len check done in v11 is mandatory for safety reasons.
> >
> > The loop above reads 64 bytes at once, so would read beyond the memory area bounds
> > if len < 64: That could cause crash or read invalid data.
>
> Sorry, I was not following your argument. You're right that we need
> something else here. However..
>
> + /*
> + * For len < 64, compare byte per byte to ensure we'll not read beyond the
> + * memory area.
> + */
> + if (len < sizeof(size_t) * 8)
> + {
> + while (p < end)
> + {
> + if (*p++ != 0)
> + return false;
> + }
> + return true;
> + }
> +
> + /* Compare bytes until the pointer "p" is aligned */
> + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
> + {
> + if (p == end)
> + return true;
> +
> + if (*p++ != 0)
> + return false;
> + }
> +
>
> Still, this is not optimal, based on what's been discussed upthread.
> The byte-per-byte check is more expensive than the size_t check,
I think that depends of the memory area size. If the size is small enough then the
byte per byte can be good enough.
For example, with the allzeros_small.c attached:
== with BLCKSZ 32
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 22528 nanoseconds
size_t: done in 6949 nanoseconds (3.24191 times faster than byte per byte)
SIMD v10: done in 7562 nanoseconds (2.97911 times faster than byte per byte)
SIMD v11: done in 22096 nanoseconds (1.01955 times faster than byte per byte)
== with BLCKSZ 63
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 29246 nanoseconds
size_t: done in 10555 nanoseconds (2.77082 times faster than byte per byte)
SIMD v10: done in 11220 nanoseconds (2.6066 times faster than byte per byte)
SIMD v11: done in 29126 nanoseconds (1.00412 times faster than byte per byte)
Obviously v11 is about the same time as "byte per byte" but we can see that the
size_t or v10 improvment is not that much for small size.
While for larger size:
== with BLCKSZ 256
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 102703 nanoseconds
size_t: done in 15381 nanoseconds (6.67726 times faster than byte per byte)
SIMD v10: done in 7241 nanoseconds (14.1835 times faster than byte per byte)
SIMD v11: done in 7899 nanoseconds (13.002 times faster than byte per byte)
== with BLCKSZ 8192
$ /usr/local/gcc-14.1.0/bin/gcc-14.1.0 -march=native -O2 allzeros_small.c -o allzeros_small ; ./allzeros_small
byte per byte: done in 2993458 nanoseconds
size_t: done in 436650 nanoseconds (6.85551 times faster than byte per byte)
SIMD v10: done in 136413 nanoseconds (21.9441 times faster than byte per byte)
SIMD v11: done in 155474 nanoseconds (19.2538 times faster than byte per byte)
It's sensitive improvment.
> shouldn't you make sure that you stack some size_t checks if dealing
> with something smaller than 64 bytes?
Based on the above I've the feeling that doing byte per byte comparison for
small size only (< 64b) is good enough. I'm not sure that adding extra complexity
for small sizes is worth it.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
allzeros_small.c | text/x-csrc | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-11-12 11:21:08 | Re: Parallel heap vacuum |
Previous Message | Torsten Förtsch | 2024-11-12 10:54:21 | Re: Allowing pg_recvlogical to create temporary replication slots |