From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date: | 2025-01-15 11:06:32 |
Message-ID: | CAEudQAqsYN2+i_05NvyS0csQbukmgoP2xX7RAp6niHTapO7i1w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi.
Em qua., 15 de jan. de 2025 às 07:57, John Naylor <johncnaylorls(at)gmail(dot)com>
escreveu:
> On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Couple of thoughts:
> >
> > 1. I was actually hoping for a comment on the constant's definition,
> > perhaps along the lines of
> >
> > /*
> > * The hex expansion of each possible byte value (two chars per value).
> > */
>
> Works for me. With that, did you mean we then wouldn't need a comment
> in the code?
>
> > 2. Since "src" is defined as "const char *", I'm pretty sure that
> > pickier compilers will complain that
> >
> > + unsigned char usrc = *((unsigned char *) src);
> >
> > results in casting away const. Recommend
> >
> > + unsigned char usrc = *((const unsigned char *) src);
>
> Thanks for the reminder!
>
> > 3. I really wonder if
> >
> > + memcpy(dst, &hextbl[2 * usrc], 2);
> >
> > is faster than copying the two bytes manually, along the lines of
> >
> > + *dst++ = hextbl[2 * usrc];
> > + *dst++ = hextbl[2 * usrc + 1];
> >
> > Compilers that inline memcpy() may arrive at the same machine code,
> > but why rely on the compiler to make that optimization? If the
> > compiler fails to do so, an out-of-line memcpy() call will surely
> > be a loser.
>
> See measurements at the end. As for compilers, gcc 3.4.6 and clang
> 3.0.0 can inline the memcpy. The manual copy above only gets combined
> to a single word starting with gcc 12 and clang 15, and latest MSVC
> still can't do it (4A in the godbolt link below). Are there any
> buildfarm animals around that may not inline memcpy for word-sized
> input?
>
> > A variant could be
> >
> > + const char *hexptr = &hextbl[2 * usrc];
> > + *dst++ = hexptr[0];
> > + *dst++ = hexptr[1];
> >
> > but this supposes that the compiler fails to see the common
> > subexpression in the other formulation, which I believe
> > most modern compilers will see.
>
> This combines to a single word starting with clang 5, but does not
> work on gcc 14.2 or gcc trunk (4B below). I have gcc 14.2 handy, and
> on my machine bytewise load/stores are somewhere in the middle:
>
> master 1158.969 ms
> v3 776.791 ms
> variant 4A 775.777 ms
> variant 4B 969.945 ms
>
> https://godbolt.org/z/ajToordKq
Your example from godbolt, has a
have an important difference, which modifies the assembler result.
-static const char hextbl[] =
"000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"
;
+static const char hextbl[512] =
"000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"
;
best regards,
Ranier Vilela
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2025-01-15 11:17:54 | Re: Purpose of wal_init_zero |
Previous Message | John Naylor | 2025-01-15 10:56:44 | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |