From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-05 12:49:53 |
Message-ID: | CAEudQAqWbLS1q4VZ+d4Tx5T-DHryQ8juu=Kgn9wfiV8iB-pzyA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em ter., 5 de nov. de 2024 às 00:23, David Rowley <dgrowleyml(at)gmail(dot)com>
escreveu:
> On Tue, 5 Nov 2024 at 06:39, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> > I think we can add a small optimization to this last patch [1].
> > The variable *aligned_end* is only needed in the second loop (for).
> > So, only before the for loop do we actually declare it.
> >
> > Result before this change:
> > check zeros using BERTRAND 1 0.000031s
> >
> > Result after this change:
> > check zeros using BERTRAND 1 0.000018s
> >
> > + const unsigned char *aligned_end;
> >
> > + /* Multiple bytes comparison(s) at once */
> > + aligned_end = (const unsigned char *) ((uintptr_t) end &
> (~(sizeof(size_t) - 1)));
> > + for (; p < aligned_end; p += sizeof(size_t))
>
> I think we all need to stop using Godbolt's servers to run benchmarks
> on. These servers are likely to be running various other workloads in
> highly virtualised environments and are not going to be stable servers
> that would give consistent benchmark results.
>
> I tried your optimisation in the attached allzeros.c and here are my
> results:
>
> # My version
> $ gcc allzeros.c -O2 -o allzeros && for i in {1..3}; do ./allzeros; done
> char: done in 1566400 nanoseconds
> size_t: done in 195400 nanoseconds (8.01638 times faster than char)
> char: done in 1537500 nanoseconds
> size_t: done in 196300 nanoseconds (7.8324 times faster than char)
> char: done in 1543600 nanoseconds
> size_t: done in 196300 nanoseconds (7.86347 times faster than char)
>
> # Ranier's optimization
> $ gcc allzeros.c -O2 -D RANIERS_OPTIMIZATION -o allzeros && for i in
> {1..3}; do ./allzeros; done
> char: done in 1943100 nanoseconds
> size_t: done in 531700 nanoseconds (3.6545 times faster than char)
> char: done in 1957200 nanoseconds
> size_t: done in 458400 nanoseconds (4.26963 times faster than char)
> char: done in 1949500 nanoseconds
> size_t: done in 469000 nanoseconds (4.15672 times faster than char)
>
> Seems to be about half as fast with gcc on -O2
>
Thanks for test coding.
I've tried with msvc 2022 32bits
Here the results:
C:\usr\src\tests\allzeros>allzeros
char: done in 71431900 nanoseconds
size_t: done in 18010900 nanoseconds (3.96604 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 71070100 nanoseconds
size_t: done in 19654300 nanoseconds (3.61601 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 68682400 nanoseconds
size_t: done in 19841100 nanoseconds (3.46162 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 63215100 nanoseconds
size_t: done in 17920200 nanoseconds (3.52759 times faster than char)
C:\usr\src\tests\allzeros>c /DRANIERS_OPTIMIZATION
Microsoft (R) Program Maintenance Utility Versão 14.40.33813.0
Direitos autorais da Microsoft Corporation. Todos os direitos reservados.
C:\usr\src\tests\allzeros>allzeros
char: done in 67213800 nanoseconds
size_t: done in 15049200 nanoseconds (4.46627 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51505900 nanoseconds
size_t: done in 13645700 nanoseconds (3.77452 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 62852600 nanoseconds
size_t: done in 17863800 nanoseconds (3.51843 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51877200 nanoseconds
size_t: done in 13759900 nanoseconds (3.77017 times faster than char)
The function used to replace clock_getime is:
timespec_get(ts, TIME_UTC)
best regards,
Ranier Vilela
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2024-11-05 12:51:05 | Re: Changing the state of data checksums in a running cluster |
Previous Message | Junwang Zhao | 2024-11-05 12:30:24 | Re: general purpose array_sort |