Re: Scaling shared buffer eviction

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-23 22:54:16
Message-ID: 20140923225416.GI2521@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-09-23 16:29:16 -0400, Robert Haas wrote:
> On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > But this gets at another point: the way we're benchmarking this right
> > now, we're really conflating the effects of three different things:
> >
> > 1. Changing the locking regimen around the freelist and clocksweep.
> > 2. Adding a bgreclaimer process.
> > 3. Raising the number of buffer locking partitions.
>
> I did some more experimentation on this. Attached is a patch that
> JUST does #1, and, as previously suggested, it uses a single spinlock
> instead of using two of them that are probably in the same cacheline.
> Without this patch, on a 32-client, read-only pgbench at scale factor
> 1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about
> LWLock-inspired preemption:
>
> - LWLockAcquire
> + 68.41% ReadBuffer_common
> + 31.59% StrategyGetBuffer
>
> With the patch, unsurprisingly, StrategyGetBuffer disappears and the
> only lwlocks that are causing context-switches are the individual
> buffer locks. But are we suffering spinlock contention instead as a
> result? Yes, but not much. s_lock is at 0.41% in the corresponding
> profile, and only 6.83% of those calls are from the patched
> StrategyGetBuffer. In a similar profile of master, s_lock is at
> 0.31%. So there's a bit of additional s_lock contention, but I think
> it's considerably less than the contention over the lwlock it's
> replacing, because the spinlock is only held for the minimal amount of
> time needed, whereas the lwlock could be held across taking and
> releasing many individual buffer locks.

Am I understanding you correctly that you also measured context switches
for spinlocks? If so, I don't think that's a valid comparison. LWLocks
explicitly yield the CPU as soon as there's any contention while
spinlocks will, well, spin. Sure they also go to sleep, but it'll take
longer.

It's also worthwile to remember in such comparisons that lots of the
cost of spinlocks will be in the calling routines, not s_lock() - the
first TAS() is inlined into it. And that's the one that'll incur cache
misses and such...

Note that I'm explicitly *not* doubting the use of a spinlock
itself. Given the short acquiration times and the exclusive use of
exlusive acquiration a spinlock makes more sense. The lwlock's spinlock
alone will have about as much contention.

I think it might be possible to construct some cases where the spinlock
performs worse than the lwlock. But I think those will be clearly in the
minority. And at least some of those will be fixed by bgwriter. As an
example consider a single backend COPY ... FROM of large files with a
relatively large s_b. That's a seriously bad workload for the current
victim buffer selection because frequently most of the needs to be
searched for a buffer with usagecount 0. And this patch will double the
amount of atomic ops during that.

Let me try to quantify that.

> What's interesting is that I can't see in the perf output any real
> sign that the buffer mapping locks are slowing things down, but they
> clearly are, because when I take this patch and also boost
> NUM_BUFFER_PARTITIONS to 128, the performance goes up:

What did events did you profile?

> I welcome further testing and comments, but my current inclination is
> to go ahead and push the attached patch. To my knowledge, nobody has
> at any point objected to this aspect of what's being proposed, and it
> looks safe to me and seems to be a clear win. We can then deal with
> the other issues on their own merits.

I've took a look at this, and all the stuff I saw that I disliked were
there before this patch. So +1 for going ahead.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-23 23:13:29 Re: Scaling shared buffer eviction
Previous Message Andres Freund 2014-09-23 22:11:48 Re: better atomics - v0.6