From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-05 03:23:34 |
Message-ID: | CAApHDvrYW5m8iwwd3NxZxtcNJ_+M9BdMiDidvyUcPKJHxb4Zdg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 5 Nov 2024 at 06:39, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> I think we can add a small optimization to this last patch [1].
> The variable *aligned_end* is only needed in the second loop (for).
> So, only before the for loop do we actually declare it.
>
> Result before this change:
> check zeros using BERTRAND 1 0.000031s
>
> Result after this change:
> check zeros using BERTRAND 1 0.000018s
>
> + const unsigned char *aligned_end;
>
> + /* Multiple bytes comparison(s) at once */
> + aligned_end = (const unsigned char *) ((uintptr_t) end & (~(sizeof(size_t) - 1)));
> + for (; p < aligned_end; p += sizeof(size_t))
I think we all need to stop using Godbolt's servers to run benchmarks
on. These servers are likely to be running various other workloads in
highly virtualised environments and are not going to be stable servers
that would give consistent benchmark results.
I tried your optimisation in the attached allzeros.c and here are my results:
# My version
$ gcc allzeros.c -O2 -o allzeros && for i in {1..3}; do ./allzeros; done
char: done in 1566400 nanoseconds
size_t: done in 195400 nanoseconds (8.01638 times faster than char)
char: done in 1537500 nanoseconds
size_t: done in 196300 nanoseconds (7.8324 times faster than char)
char: done in 1543600 nanoseconds
size_t: done in 196300 nanoseconds (7.86347 times faster than char)
# Ranier's optimization
$ gcc allzeros.c -O2 -D RANIERS_OPTIMIZATION -o allzeros && for i in
{1..3}; do ./allzeros; done
char: done in 1943100 nanoseconds
size_t: done in 531700 nanoseconds (3.6545 times faster than char)
char: done in 1957200 nanoseconds
size_t: done in 458400 nanoseconds (4.26963 times faster than char)
char: done in 1949500 nanoseconds
size_t: done in 469000 nanoseconds (4.15672 times faster than char)
Seems to be about half as fast with gcc on -O2
David
Attachment | Content-Type | Size |
---|---|---|
allzeros.c | text/plain | 2.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2024-11-05 04:08:41 | Re: define pg_structiszero(addr, s, r) |
Previous Message | Joel Jacobson | 2024-11-05 03:22:13 | Re: New "raw" COPY format |