Quick Links

Re: Improving spin-lock implementation on ARM.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Krunal Bauskar <krunalbauskar(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving spin-lock implementation on ARM.
Date:	2020-11-30 18:21:15
Message-ID:	1274781.1606760475@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I tend to think that LSE is enabled by default in Apple's clang based
> on your previous message[1]. In order to dispel the doubts could you
> please provide assembly of SpinLockAcquire for following clang
> options.
> "-O2"
> "-O2 -march=armv8-a+lse"
> "-O2 -march=armv8-a"

Huh. Those options make exactly zero difference to the code generated
for SpinLockAcquire/SpinLockRelease; it's the same as I showed upthread,
for either the HEAD definition of TAS() or the CAS patch's version.

So now I'm at a loss as to the reason for the performance difference
I got. -march=armv8-a+lse does make a difference to code generation
someplace, because the overall size of the postgres executable changes
by 16kB or so. One might argue that the performance difference is due
to better code elsewhere than the spinlocks ... but the test I'm running
is basically just

while (count-- > 0)
{
XLogGetLastRemovedSegno();

CHECK_FOR_INTERRUPTS();
}

so it's hard to see where a non-spinlock-related code change would come
in. That loop itself definitely generates the same code either way.

I did find this interesting output from "clang -v":

-target-cpu vortex -target-feature +v8.3a -target-feature +fp-armv8 -target-feature +neon -target-feature +crc -target-feature +crypto -target-feature +fullfp16 -target-feature +ras -target-feature +lse -target-feature +rdm -target-feature +rcpc -target-feature +zcm -target-feature +zcz -target-feature +sha2 -target-feature +aes

whereas adding -march=armv8-a+lse changes that to just

-target-cpu vortex -target-feature +neon -target-feature +lse -target-feature +zcm -target-feature +zcz

On the whole, that would make one think that -march=armv8-a+lse
should produce worse code than the default settings.

regards, tom lane

In response to

Re: Improving spin-lock implementation on ARM. at 2020-11-30 13:16:28 from Alexander Korotkov

Responses

Re: Improving spin-lock implementation on ARM. at 2020-11-30 20:46:44 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2020-11-30 18:25:28	Re: Add Information during standby recovery conflicts
Previous Message	Fujii Masao	2020-11-30 18:04:07	Re: Add Information during standby recovery conflicts