From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date: | 2025-01-15 10:56:44 |
Message-ID: | CANWCAZaq-hHGzEE9u=+ed3czdn6WUQdmGpmUpBiT7p+wNsOS+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Couple of thoughts:
>
> 1. I was actually hoping for a comment on the constant's definition,
> perhaps along the lines of
>
> /*
> * The hex expansion of each possible byte value (two chars per value).
> */
Works for me. With that, did you mean we then wouldn't need a comment
in the code?
> 2. Since "src" is defined as "const char *", I'm pretty sure that
> pickier compilers will complain that
>
> + unsigned char usrc = *((unsigned char *) src);
>
> results in casting away const. Recommend
>
> + unsigned char usrc = *((const unsigned char *) src);
Thanks for the reminder!
> 3. I really wonder if
>
> + memcpy(dst, &hextbl[2 * usrc], 2);
>
> is faster than copying the two bytes manually, along the lines of
>
> + *dst++ = hextbl[2 * usrc];
> + *dst++ = hextbl[2 * usrc + 1];
>
> Compilers that inline memcpy() may arrive at the same machine code,
> but why rely on the compiler to make that optimization? If the
> compiler fails to do so, an out-of-line memcpy() call will surely
> be a loser.
See measurements at the end. As for compilers, gcc 3.4.6 and clang
3.0.0 can inline the memcpy. The manual copy above only gets combined
to a single word starting with gcc 12 and clang 15, and latest MSVC
still can't do it (4A in the godbolt link below). Are there any
buildfarm animals around that may not inline memcpy for word-sized
input?
> A variant could be
>
> + const char *hexptr = &hextbl[2 * usrc];
> + *dst++ = hexptr[0];
> + *dst++ = hexptr[1];
>
> but this supposes that the compiler fails to see the common
> subexpression in the other formulation, which I believe
> most modern compilers will see.
This combines to a single word starting with clang 5, but does not
work on gcc 14.2 or gcc trunk (4B below). I have gcc 14.2 handy, and
on my machine bytewise load/stores are somewhere in the middle:
master 1158.969 ms
v3 776.791 ms
variant 4A 775.777 ms
variant 4B 969.945 ms
https://godbolt.org/z/ajToordKq
--
John Naylor
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2025-01-15 11:06:32 | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Previous Message | Bertrand Drouvot | 2025-01-15 10:55:44 | Re: per backend I/O statistics |