From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Subject: | Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease |
Date: | 2014-02-14 12:52:45 |
Message-ID: | EE0DD308-3A11-47E7-99FA-1F41C29DE448@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Feb14, 2014, at 13:36 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
>>> I don't think that can actually happen because the head of the wait list
>>> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
>>> for a while...
>>
>> Hm, true, but does that protect us under all circumstances? If another
>> backend grabs the lock before B gets a chance to do so, B will become the
>> wait queue's head, and anyone who enqueues behind B will do so by updating
>> B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
>> by the original lock holder.
>>
>> The specific sequence I have in mind is:
>>
>> A: Take lock
>> B: Enqueue
>> A: Release lock, set B's lwWaiting to false
>> D: Acquire lock
>> B: Enqueue after spurious wakeup
>> (lock->head := B)
>> C: Enqueue behind B
>> (B->lwWaitLink := C, lock->tail := C)
>> A: Set B's wWaitLink back to NULL, thereby corrupting the queue
>> (B->lwWaitLink := NULL)
>> D: Release lock, dequeue and wakeup B
>> (lock->head := B->lwWaitLink, i.e. lock->head := NULL)
>> B: Take and release lock, queue appears empty!
>> C blocks indefinitely.
>
> That's assuming either reordering by the compiler or an out-of-order
> store architecture, right?
Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.
>>>> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
>>>> We take the backends we awake off the queue by updating the queue's head and
>>>> tail, so the contents of lwWaitLink should only matter once the backend is
>>>> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
>>>> back to NULL anway.
>>>
>>> I don't like that, this stuff is hard to debug already, having stale
>>> pointers around doesn't help. I think we should just add the barrier in
>>> the releases with barrier.h and locally use a volatile var in the
>>> branches before that.
>>
>> I like the idea of updating lwWaiting and lwWaitLink while still holding the
>> spinlock better. The costs are probably negligible, and it'd make the code much
>> easier to reason about.
>
> I agree we should do that, but imo not in the backbranches. Anything
> more than than the minimal fix in that code should be avoided in the
> stable branches, this stuff is friggin performance sensitive, and the
> spinlock already is a *major* contention point in many workloads.
No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2014-02-14 12:55:06 | Re: walsender doesn't send keepalives when writes are pending |
Previous Message | Andres Freund | 2014-02-14 12:40:14 | Re: issue with gininsert under very high load |