From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | MauMau <maumau307(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Subject: | Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease |
Date: | 2014-02-11 13:07:57 |
Message-ID: | 20140211130757.GE31598@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-02-11 21:46:04 +0900, MauMau wrote:
> From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> >which means they manipulate the lwWaitLink queue without
> >protection. That's done intentionally. The code tries to protect against
> >corruption of the list to do a woken up backend acquiring a lock (this
> >or an independent one) by only continuing when the lwWaiting flag is set
> >to false. Unfortunately there's absolutely no guarantee that a) the
> >assignment to lwWaitLink and lwWaiting are done in that order b) that
> >the stores are done in-order from the POV of other backends.
> >So what we need to do is to acquire a write barrier between the
> >assignments to lwWaitLink and lwWaiting, i.e.
> > proc->lwWaitLink = NULL;
> > pg_write_barrier();
> > proc->lwWaiting = false;
> >the reader side already uses an implicit barrier by using spinlocks.
>
> I've got a report from one customer that they encountered a hang during
> performance benchmarking. They were using PostgreSQL 9.2.4. I remember
> that the stack trace showed many backends blocked forever at LWLockAcuuire()
> during btree insert operation. I'm not sure this has something to do with
> what you are raising, but the release notes for 9.2.5/6 doesn't suggest any
> fixes for this. So I felt there is something wrong with lwlocks.
>
> Do you think that your question could cause my customer's problem --
> backends block at lwlock forever?
It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
to
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.
Any chance you have the binaries the customer ran back then around?
Disassembling that piece of code might give you a hint whether that's a
possible cause.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-02-11 14:15:45 | Re: Patch: show xid and xmin in pg_stat_activity and pg_stat_replication |
Previous Message | MauMau | 2014-02-11 12:46:04 | Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease |