From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-16 14:40:58 |
Message-ID: | CAEudQApzEUAg+EYs=jtvSTA7E7zZ21vc5Xcz64Ra_1FAxq8LcA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em sex., 15 de nov. de 2024 às 11:43, Bertrand Drouvot <
bertranddrouvot(dot)pg(at)gmail(dot)com> escreveu:
> Hi,
>
> On Fri, Nov 15, 2024 at 09:54:33AM -0300, Ranier Vilela wrote:
> > There is a tiny typo with V13.
> > + /* "len" in the [sizeof(size_t) * 8, inf] range */
>
> I think "[sizeof(size_t) * 8, inf[ range" is correct. Infinity can not be
> included
> into a interval.
>
> Thinking about it, actually, "[sizeof(size_t) * 8, inf)" (note the ')' at
> the end)
> might be the proper notation from a mathematical point of view.
>
Thanks for clarifying.
>
> > But, I'm not sure if I'm still doing something wrong.
> > If so, forgive me for the noise.
> >
> > Of course I expected "not is_allzeros".
>
> That's the test case which is "wrong" (not the function):
>
> "
> size_t pagebytes[BLCKSZ] = {0};
> volatile bool result;
>
> pagebytes[BLCKSZ-2] = 1;
>
> result = pg_memory_is_all_zeros_v12(pagebytes, BLCKSZ);
> "
>
> The pagebytes is an array of size_t (8 bytes each), so the actual array
> size
> is 8192 * 8 = 65536 bytes.
>
> So, pagebytes[BLCKSZ-2] = 1, sets byte 65528 ((8192-2)*8) to 1.
>
> But the function is checking up to BLCKSZ bytes (8192), so the results you
> observed (which are correct).
>
Thanks for pointing out my mistake.
>
> > Anyway, I made another attempt to optimize a bit more, I don't know if
> it's
> > safe though.
>
> There is an issue in your v14, it calls:
>
> "
> return pg_memory_is_all_zeros_simd(ptr, ptr + len);
> "
>
> but you defined it that way:
>
> "
> static inline bool
> pg_memory_is_all_zeros_simd(const size_t *p, const size_t * end)
>
> "
>
> while that should be:
>
> "
> static inline bool
> pg_memory_is_all_zeros_simd(const void *p, const void *end)
>
What I'm trying here, obviously, is a hack.
If it works, and the compiler accepts it, it's ok for me.
> "
>
> Doing so, I do not observe any improvments with v14.
>
So.
Again new results from v4_allzeros_small.c attached:
Linux Ubuntu 22.04
gcc 13 64 bits
With BLCKSZ 32
gcc -march=native -O2 v4_allzeros_small.c -o v4_allzeros_small ;
./v4_allzeros_small
byte per byte: done in 44092 nanoseconds
size_t: done in 13456 nanoseconds (3.27675 times faster than byte per byte)
SIMD v10: done in 14249 nanoseconds (3.09439 times faster than byte per
byte)
SIMD v11: done in 32516 nanoseconds (1.35601 times faster than byte per
byte)
SIMD v12: done in 14973 nanoseconds (2.94477 times faster than byte per
byte)
SIMD v14: done in 13101 nanoseconds (3.36554 times faster than byte per
byte)
With BLCKSZ 63
gcc -march=native -O2 v4_allzeros_small.c -o v4_allzeros_small ;
./v4_allzeros_small
byte per byte: done in 67656 nanoseconds
size_t: done in 25768 nanoseconds (2.62558 times faster than byte per byte)
SIMD v10: done in 21446 nanoseconds (3.15471 times faster than byte per
byte)
SIMD v11: done in 56887 nanoseconds (1.18931 times faster than byte per
byte)
SIMD v12: done in 22863 nanoseconds (2.95919 times faster than byte per
byte)
SIMD v14: done in 21273 nanoseconds (3.18037 times faster than byte per
byte)
With BLCKSZ 256
gcc -march=native -O2 v4_allzeros_small.c -o v4_allzeros_small ;
./v4_allzeros_small
byte per byte: done in 220064 nanoseconds
size_t: done in 45886 nanoseconds (4.79589 times faster than byte per byte)
SIMD v10: done in 12032 nanoseconds (18.2899 times faster than byte per
byte)
SIMD v11: done in 11965 nanoseconds (18.3923 times faster than byte per
byte)
SIMD v12: done in 12041 nanoseconds (18.2762 times faster than byte per
byte)
SIMD v14: done in 12575 nanoseconds (17.5001 times faster than byte per
byte)
With BLCKSZ 8192
gcc -march=native -O2 v4_allzeros_small.c -o v4_allzeros_small ;
./v4_allzeros_small
byte per byte: done in 10365876 nanoseconds
size_t: done in 827654 nanoseconds (12.5244 times faster than byte per byte)
SIMD v10: done in 347755 nanoseconds (29.808 times faster than byte per
byte)
SIMD v11: done in 342813 nanoseconds (30.2377 times faster than byte per
byte)
SIMD v12: done in 341124 nanoseconds (30.3874 times faster than byte per
byte)
SIMD v14: done in 50646 nanoseconds (204.673 times faster than byte per
byte)
Results with v4_allzeros_check.c attached:
gcc -march=native -O2 v4_allzeros_check.c -o v4_allzeros_check ;
./v4_allzeros_check
sizeof(pagebytes)=32
byte per byte: is_allzeros
size_t: is_allzeros
SIMD v10: is_allzeros
SIMD v11: is_allzeros
SIMD v12: is_allzeros
SIMD v14: is_allzeros
gcc -march=native -O2 v4_allzeros_check.c -o v4_allzeros_check ;
./v4_allzeros_check
sizeof(pagebytes)=63
byte per byte: is_allzeros
size_t: is_allzeros
SIMD v10: is_allzeros
SIMD v11: is_allzeros
SIMD v12: is_allzeros
SIMD v14: is_allzeros
gcc -march=native -O2 v4_allzeros_check.c -o v4_allzeros_check ;
./v4_allzeros_check
sizeof(pagebytes)=256
byte per byte: is_allzeros
size_t: is_allzeros
SIMD v10: is_allzeros
SIMD v11: is_allzeros
SIMD v12: is_allzeros
p01=(0x7ffedb8ac430)
end=(0x7ffedb8ac530)
p02=(0x7ffedb8ac530)
SIMD v14: is_allzeros
gcc -march=native -O2 v4_allzeros_check.c -o v4_allzeros_check ;
./v4_allzeros_check
sizeof(pagebytes)=8192
byte per byte: is_allzeros
size_t: is_allzeros
SIMD v10: is_allzeros
SIMD v11: is_allzeros
SIMD v12: is_allzeros
p01=(0x7ffd8864c200)
end=(0x7ffd8864e200)
p02=(0x7ffd8864e200)
SIMD v14: is_allzeros
If this hack is safe and correct, I think that 204 times faster,
it is very good, for a block size 8192.
That said,
V13 is fine as is.
LGTM.
best regards,
Ranier Vilela
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2024-11-16 14:42:54 | Re: define pg_structiszero(addr, s, r) |
Previous Message | Pavel Stehule | 2024-11-16 14:34:58 | Re: proposal: schema variables |