From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: define pg_structiszero(addr, s, r) |
Date: | 2024-11-07 00:44:32 |
Message-ID: | ZywNcJRvkJ73lcb-@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 07, 2024 at 08:10:17AM +1300, David Rowley wrote:
> Did you try with a size where there's a decent remainder, say 124
> bytes? FWIW, one of the cases has 112 bytes, and I think that is
> aligned memory meaning we'll do the first 64 in the SIMD loop and have
> to do 48 bytes in the byte-at-a-time loop. If you had the loop Michael
> mentioned, that would instead be 6 loops of size_t-at-a-time.
See the attached allzeros.c, based on the previous versions exchanged.
And now just imagine a structure like that:
#define BLCKSZ 48
typedef union AlignedBlock
{
char data[BLCKSZ];
double force_align_d;
int64_t force_align_i64;
} AlignedBlock;
This structure is optimized so as the first step to do the char step
is skipped because the pointer is aligned when allocated, and the
second step with the potential SIMD is skipped because the structure
is small enough at 48 bytes. Hence only the last step would do the
allzero check. Adding a size_t step to force a loop is going to be
more efficient, as proved upthread:
$ gcc -o allzeros -march=native -O2 allzeros.c
$ ./allzeros
allzeros: done in 118332297 nanoseconds
allzeros_v2: done in 13877745 nanoseconds (8.52677 times faster)
The allzero check is used for pgstat entries, and it could be possible
that some out-of-core code needs to rely on such small-ish sizes, or
even something else when a patch author feels like it. So let's make
that optimized as much as we think we can: that's what this discussion
is about.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
allzeros.c | text/x-csrc | 3.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-11-07 00:45:44 | Re: define pg_structiszero(addr, s, r) |
Previous Message | Peter Geoghegan | 2024-11-07 00:38:11 | Re: index prefetching |