From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reduce ProcArrayLock contention |
Date: | 2015-08-21 18:31:12 |
Message-ID: | 20150821183112.GG8552@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-08-21 14:08:36 -0400, Robert Haas wrote:
> On Wed, Aug 19, 2015 at 11:39 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > There's a bunch of issues with those two blocks afaics:
> >
> > 1) The first block (in one backend) might read INVALID_PGPROCNO before
> > ever locking the semaphore if a second backend quickly enough writes
> > INVALID_PGPROCNO. That way the semaphore will be permanently out of
> > "balance".
>
> Yes, this is a clear bug. I think the fix is to do one unconditional
> PGSemaphoreLock(&proc->sem) just prior to the loop.
I don't think that fixes it it completely, see the following.
> > 2) There's no memory barriers around dealing with nextClearXidElem in
> > the first block. Afaics there needs to be a read barrier before
> > returning, otherwise it's e.g. not guaranteed that the woken up
> > backend sees its own xid set to InvalidTransactionId.
>
> I can't believe it actually works like that. Surely a semaphore
> operation is a full barrier. Otherwise this could fail:
> P1: a = 0;
> P1: PGSemaphoreLock(&P1);
> P2: a = 1;
> P2: PGSemaphoreUnlock(&P1);
> P1: Assert(a == 1);
>
> Do you really think that fails anywhere?
No, if it's paired like that, I don't think it's allowed to fail.
But, as the code stands, there's absolutely no guarantee you're not
seeing something like:
P1: a = 0;
P1: b = 0;
P1: PGSemaphoreLock(&P1);
P2: a = 1;
P2: PGSemaphoreUnlock(&P1); -- unrelated, as e.g. earlier by ProcSendSignal
P1: Assert(a == b == 1);
P2: b = 1;
P2: PGSemaphoreUnlock(&P1);
if the pairing is like this there's no guarantees anymore, right? Even
if a and be were set before P1's assert, the thing would be allowed to
fail, because the store to a or b might each be visible since there's no
enforced ordering.
> > 3) If a follower returns before the leader has actually finished woken
> > that respective backend up we can get into trouble:
...
> So I don't see the problem.
Don't have time (nor spare brain capacity) to answer in detail right
now.
> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.
>
> It made sense to me, but I don't care that much if you want to change it.
Please.
> > How hard did you try checking whether this causes regressions? This
> > increases the number of atomics in the commit path a fair bit. I doubt
> > it's really bad, but it seems like a good idea to benchmark something
> > like a single full-throttle writer and a large number of readers.
>
> Hmm, that's an interesting point. My analysis was that it really only
> increased the atomics in the cases where we otherwise would have gone
> to sleep, and I figured that the extra atomics were a small price to
> pay for not sleeping.
You're probably right that it won't have a big, if any, impact. Seems
easy enough to test though, and it's really the only sane adversarial
scenario I could come up with..
> It's worth point out, though, that your lwlock.c atomics changes have
> the same effect.
To some degree, yes. I tried to benchmark adversarial scenarios rather
extensively because of that... I couldn't make it regress, presumably
because because putting on the waitlist only "hard" contended with other
exclusive lockers, not with the shared lockers which could progress.
> Before, if there were any shared lockers on an LWLock and an
> exclusive-locker came along, the exclusive locker would take and
> release the spinlock and go to sleep.
That often spun (span?) on a spinlock which continously was acquired, so
it was hard to ever get that far...
> The changes we made to the clock sweep are more of the same.
Yea.
> Instead of taking an lwlock, sweeping the clock hand across many
> buffers, and releasing the lwlock, we now perform an atomic operation
> for every buffer. That's obviously "worse" in terms of the total
> number of atomic operations, and if you construct a case where every
> backend sweeps 10 or 20 buffers yet none of them would have contended
> with each other, it might lose. But it's actually much better and
> more scalable in the real world.
I think we probably should optimize that bit of code at some point -
right now the bottleneck appear to be elsewhere though, namely all the
buffer header locks which are rather likely to be in no cache at all.
> I think this is a pattern we'll see with atomics over and over again:
> more atomic operations overall in order to reduce the length of time
> for which particular resources are unavailable to other processes.
Yea, agreed.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-08-21 18:37:36 | Re: (full) Memory context dump considered harmful |
Previous Message | Robert Haas | 2015-08-21 18:28:48 | Re: More WITH |