From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Condition variable live lock |
Date: | 2017-12-28 23:16:20 |
Message-ID: | CAEepm=1_S2Ly3Q53yViq29RVJmvaUw8hXs5_ekg_E1uHrNtXGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 22, 2017 at 4:46 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> while (ConditionVariableSignal(cv))
> ++nwoken;
>
> The problem is that another backend can be woken up, determine that it
> would like to wait for the condition variable again, and then get
> itself added to the back of the wait queue *before the above loop has
> finished*, so this interprocess ping-pong isn't guaranteed to
> terminate. It seems that we'll need something slightly smarter than
> the above to avoid that.
Here is one way to fix it: track the wait queue size and use that
number to limit the wakeup loop. See attached.
That's unbackpatchable though, because it changes the size of struct
ConditionVariable, potentially breaking extensions compiled against an
earlier point release. Maybe this problem won't really cause problems
in v10 anyway? It requires a particular interaction pattern that
barrier.c produces but more typical client code might not: the awoken
backends keep re-adding themselves because they're waiting for
everyone (including the waker) to do something, but the waker is stuck
in that broadcast loop.
Thoughts?
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
fix-cv-livelock.patch | application/octet-stream | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2017-12-29 01:18:00 | Re: The pg_indent on on ftp is outdated |
Previous Message | Bossart, Nathan | 2017-12-28 22:46:18 | Re: BUG #14941: Vacuum crashes |