From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Condition variable live lock |
Date: | 2017-12-29 19:38:43 |
Message-ID: | 20171229193843.tsbob2xsjfmxkq2g@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-12-29 12:16:20 +1300, Thomas Munro wrote:
> Here is one way to fix it: track the wait queue size and use that
> number to limit the wakeup loop. See attached.
>
> That's unbackpatchable though, because it changes the size of struct
> ConditionVariable, potentially breaking extensions compiled against an
> earlier point release. Maybe this problem won't really cause problems
> in v10 anyway? It requires a particular interaction pattern that
> barrier.c produces but more typical client code might not: the awoken
> backends keep re-adding themselves because they're waiting for
> everyone (including the waker) to do something, but the waker is stuck
> in that broadcast loop.
Hm, I'm not quite convinced by this approach. Partially because of the
backpatch issue you mention, partially because using the list length as
a limit doesn't seem quite nice.
Given that the proclist_contains() checks in condition_variable.c are
already racy, I think it might be feasible to collect all procnos to
signal while holding the spinlock, and then signal all of them in one
go.
Obviously it'd be nicer to not hold a spinlock while looping, but that
seems like something we can't fix in the back branches. [insert rant
about never using spinlocks unless there's very very clear convicing
reasons].
- Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-12-29 19:55:36 | Re: [PATCH] Logical decoding of TRUNCATE |
Previous Message | Petr Jelinek | 2017-12-29 19:32:25 | Re: [PATCH] session_replication_role = replica with TRUNCATE |