Re: [PATCH] Hex-coding optimizations using SVE on ARM.

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date: 2025-01-16 00:42:09
Message-ID: CAApHDvpJr7r0n6TY_0psumA1uLy7+nmdq==L3jjDT-RYPMoHRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 15 Jan 2025 at 23:57, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Compilers that inline memcpy() may arrive at the same machine code,
> > but why rely on the compiler to make that optimization? If the
> > compiler fails to do so, an out-of-line memcpy() call will surely
> > be a loser.
>
> See measurements at the end. As for compilers, gcc 3.4.6 and clang
> 3.0.0 can inline the memcpy. The manual copy above only gets combined
> to a single word starting with gcc 12 and clang 15, and latest MSVC
> still can't do it (4A in the godbolt link below). Are there any
> buildfarm animals around that may not inline memcpy for word-sized
> input?
>
> > A variant could be
> >
> > + const char *hexptr = &hextbl[2 * usrc];
> > + *dst++ = hexptr[0];
> > + *dst++ = hexptr[1];

I'd personally much rather see us using memcpy() for this sort of
stuff. If the compiler is too braindead to inline tiny
constant-and-power-of-two-sized memcpys then we'd probably also have
plenty of other performance issues with that compiler already. I don't
think contorting the code into something less human-readable and
something the compiler may struggle even more to optimise is a good
idea. The nieve way to implement the above requires two MOVs of
single bytes and two increments of dst. I imagine it's easier for the
compiler to inline a small constant-sized memcpy() than to figure out
that it's safe to implement the above with a single word-sized MOV
rather than two byte-sized MOVs due to the "dst++" in between the two.

I agree that the evidence you (John) gathered is enough reason to use memcpy().

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-01-16 00:42:49 Re: Infinite loop in XLogPageRead() on standby
Previous Message Nathan Bossart 2025-01-15 23:47:51 Re: convert libpgport's pqsignal() to a void function