From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] |
Date: | 2011-12-19 21:25:06 |
Message-ID: | 4EEFABB2.5050100@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 19.12.2011 22:12, Tom Lane wrote:
> Noah Misch<noah(at)leadboat(dot)com> writes:
>> On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
>>> That is not sufficient on platforms with a weak memory model, like Itanium.
>
>> Other processors may observe the lock as held after its release, but there's no
>> correctness problem.
>
> How weak is the memory model, exactly?
>
> A correctness problem would ensue if out-of-order stores are possible,
> ie other processors could observe the lock being freed (and then acquire
> it) before seeing changes to shared variables that had been made while
> holding the lock.
>
> I'm a little dubious that this applies to Itanium, because I don't see
> any memory fence instruction in the TAS macro. If we needed one in
> S_UNLOCK I'd rather expect there to be one in TAS as well.
There's a pretty good manual called "Implementing Spinlocks on Itanium
and PA-RISC" from HP at:
http://h21007.www2.hp.com/portal/download/files/unprot/itanium/spinlocks.pdf.
It says, in section "3.2.1 Try to get a spinlock":
> Note also, that the ‘xchg’ instruction always
> has the ‘acquire’ semantics, so it enforces the correct memory ordering.
But the current implementation seems to be safe, anyway. In section
"3.2.3 Freeing a spinlock", that manual says:
> Note, that a simple assignment statement into a volatile variable
will not work, as we are
> assuming that the +Ovolatile=__unordered compile option is being used.
So +Ovolatile=__unordered would allow the kind of optimization that I
thought was possible, and we would have a problem if we used it. But the
default is more conservative, and a simple assignment does work.
I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The relevant
assembler output is:
xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
ld1.acq r16 = [r11] // M [slocktest.c: 67/3]
nop.i 0 // I
//file/line/col slocktest.c/68/3
st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
st4.rel [r15] = r0 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1
The trick I missed is that the compiler attaches .rel to all the stores
and .acq to all the loads through a volatile pointer. gcc seems to do
the same. So we're safe.
It would be interesting to test whether using +Ovolatile=__unordered
would make a difference to performance. Tacking those memory fence
modifiers to all the volatile loads and stores might be expensive.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
slocktest.c | text/x-csrc | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2011-12-19 21:53:41 | Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] |
Previous Message | Heikki Linnakangas | 2011-12-19 20:21:22 | Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] |