Re: Reduce ProcArrayLock contention

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce ProcArrayLock contention
Date: 2015-08-21 18:31:12
Message-ID: 20150821183112.GG8552@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-08-21 14:08:36 -0400, Robert Haas wrote:
> On Wed, Aug 19, 2015 at 11:39 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > There's a bunch of issues with those two blocks afaics:
> >
> > 1) The first block (in one backend) might read INVALID_PGPROCNO before
> > ever locking the semaphore if a second backend quickly enough writes
> > INVALID_PGPROCNO. That way the semaphore will be permanently out of
> > "balance".
>
> Yes, this is a clear bug. I think the fix is to do one unconditional
> PGSemaphoreLock(&proc->sem) just prior to the loop.

I don't think that fixes it it completely, see the following.

> > 2) There's no memory barriers around dealing with nextClearXidElem in
> > the first block. Afaics there needs to be a read barrier before
> > returning, otherwise it's e.g. not guaranteed that the woken up
> > backend sees its own xid set to InvalidTransactionId.
>
> I can't believe it actually works like that. Surely a semaphore
> operation is a full barrier. Otherwise this could fail:

> P1: a = 0;
> P1: PGSemaphoreLock(&P1);
> P2: a = 1;
> P2: PGSemaphoreUnlock(&P1);
> P1: Assert(a == 1);
>
> Do you really think that fails anywhere?

No, if it's paired like that, I don't think it's allowed to fail.

But, as the code stands, there's absolutely no guarantee you're not
seeing something like:
P1: a = 0;
P1: b = 0;
P1: PGSemaphoreLock(&P1);
P2: a = 1;
P2: PGSemaphoreUnlock(&P1); -- unrelated, as e.g. earlier by ProcSendSignal
P1: Assert(a == b == 1);
P2: b = 1;
P2: PGSemaphoreUnlock(&P1);

if the pairing is like this there's no guarantees anymore, right? Even
if a and be were set before P1's assert, the thing would be allowed to
fail, because the store to a or b might each be visible since there's no
enforced ordering.

> > 3) If a follower returns before the leader has actually finished woken
> > that respective backend up we can get into trouble:
...
> So I don't see the problem.

Don't have time (nor spare brain capacity) to answer in detail right
now.

> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.
>
> It made sense to me, but I don't care that much if you want to change it.

Please.

> > How hard did you try checking whether this causes regressions? This
> > increases the number of atomics in the commit path a fair bit. I doubt
> > it's really bad, but it seems like a good idea to benchmark something
> > like a single full-throttle writer and a large number of readers.
>
> Hmm, that's an interesting point. My analysis was that it really only
> increased the atomics in the cases where we otherwise would have gone
> to sleep, and I figured that the extra atomics were a small price to
> pay for not sleeping.

You're probably right that it won't have a big, if any, impact. Seems
easy enough to test though, and it's really the only sane adversarial
scenario I could come up with..

> It's worth point out, though, that your lwlock.c atomics changes have
> the same effect.

To some degree, yes. I tried to benchmark adversarial scenarios rather
extensively because of that... I couldn't make it regress, presumably
because because putting on the waitlist only "hard" contended with other
exclusive lockers, not with the shared lockers which could progress.

> Before, if there were any shared lockers on an LWLock and an
> exclusive-locker came along, the exclusive locker would take and
> release the spinlock and go to sleep.

That often spun (span?) on a spinlock which continously was acquired, so
it was hard to ever get that far...

> The changes we made to the clock sweep are more of the same.

Yea.

> Instead of taking an lwlock, sweeping the clock hand across many
> buffers, and releasing the lwlock, we now perform an atomic operation
> for every buffer. That's obviously "worse" in terms of the total
> number of atomic operations, and if you construct a case where every
> backend sweeps 10 or 20 buffers yet none of them would have contended
> with each other, it might lose. But it's actually much better and
> more scalable in the real world.

I think we probably should optimize that bit of code at some point -
right now the bottleneck appear to be elsewhere though, namely all the
buffer header locks which are rather likely to be in no cache at all.

> I think this is a pattern we'll see with atomics over and over again:
> more atomic operations overall in order to reduce the length of time
> for which particular resources are unavailable to other processes.

Yea, agreed.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-21 18:37:36 Re: (full) Memory context dump considered harmful
Previous Message Robert Haas 2015-08-21 18:28:48 Re: More WITH