Re: Condition variable live lock

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Condition variable live lock
Date: 2018-01-04 20:54:47
Message-ID: 20180104205447.mdaub47aunp3h3mq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-01-04 12:39:47 -0500, Robert Haas wrote:
> > Given that the proclist_contains() checks in condition_variable.c are
> > already racy, I think it might be feasible to collect all procnos to
> > signal while holding the spinlock, and then signal all of them in one
> > go.
>
> That doesn't seem very nice at all. Not only does it violate the
> coding rule against looping while holding a spinlock, but it seems
> that it would require allocating memory while holding one, which is a
> non-starter.

We could just use a sufficiently sized buffer beforehand. There's an
obvious upper boundary, so that shouldn't be a big issue.

> > Obviously it'd be nicer to not hold a spinlock while looping, but that
> > seems like something we can't fix in the back branches. [insert rant
> > about never using spinlocks unless there's very very clear convicing
> > reasons].
>
> I don't think that's a coding rule that I'd be prepared to endorse.
> We've routinely used spinlocks for years in cases where the critical
> section was very short, just to keep the overhead down.

The problem is that due to the contention handling they really don't
keep the overhead that low unless you're absolutely absolutely
maximizing for low number of cycles and have very little
contention. Which isn't actually common. I think part of the
conventional wisdom when to use spinlock vs lwlocks went out of the
window once we got better scaling lwlocks.

> It's not clear to me that we entirely need a back-patchable fix for
> this. It could be that parallel index scan can have the same issue,
> but I'm not aware of any user complaints.

I don't think many users are going to be able to diagnose this one, and
it's probably not easily diagnosable even if they complain about
performance.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-04 20:56:32 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
Previous Message Andres Freund 2018-01-04 20:47:41 Re: pgsql: Add parallel-aware hash joins.