Re: Move unused buffers to freelist

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move unused buffers to freelist
Date: 2013-06-27 12:23:31
Message-ID: CA+TgmobJm0GHk58nUPRQHCGwY25n1DCkU4ku9aQeczZEjiz9mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> Configuration Details
> O/S - Suse-11
> RAM - 128GB
> Number of Cores - 16
> Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min,
> synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off Pgbench -
> Select-only Scalefactor - 1200 Time - 30 mins
>
> 8C-8T 16C-16T 32C-32T 64C-64T
> Head 62403 101810 99516 94707
> Patch 62827 101404 99109 94744
>
> On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB shared
> buffers, this is no major difference.
> One of the reasons could be that there is no much swapping in shared buffers
> as most data already fits in shared buffers.

I'd like to just back up a minute here and talk about the broader
picture here. What are we trying to accomplish with this patch? Last
year, I did some benchmarking on a big IBM POWER7 machine (16 cores,
64 hardware threads). Here are the results:

http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

Now, if you look at these results, you see something interesting.
When there aren't too many concurrent connections, the higher scale
factors are only modestly slower than the lower scale factors. But as
the number of connections increases, the performance continues to rise
at the lower scale factors, and at the higher scale factors, this
performance stops rising and in fact drops off. So in other words,
there's no huge *performance* problem for a working set larger than
shared_buffers, but there is a huge *scalability* problem. Now why is
that?

As far as I can tell, the answer is that we've got a scalability
problem around BufFreelistLock. Contention on the buffer mapping
locks may also be a problem, but all of my previous benchmarking (with
LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant
in the room. My interest in having the background writer add buffers
to the free list is basically around solving that problem. It's a
pretty dramatic problem, as the graph above shows, and this patch
doesn't solve it. There may be corner cases where this patch improves
things (or, equally, makes them worse) but as a general point, the
difficulty I've had reproducing your test results and the specificity
of your instructions for reproducing them suggests to me that what we
have here is not a clear improvement on general workloads. Yet such
an improvement should exist, because there are other products in the
world that have scalable buffer managers; we currently don't. Instead
of spending a lot of time trying to figure out whether there's a small
win in narrow cases here (and there may well be), I think we should
back up and ask why this isn't a great big win, and what we'd need to
do to *get* a great big win. I don't see much point in tinkering
around the edges here if things are broken in the middle; things that
seem like small wins or losses now may turn out otherwise in the face
of a more comprehensive solution.

One thing that occurred to me while writing this note is that the
background writer doesn't have any compelling reason to run on a
read-only workload. It will still run at a certain minimum rate, so
that it cycles the buffer pool every 2 minutes, if I remember
correctly. But it won't run anywhere near fast enough to keep up with
the buffer allocation demands of 8, or 32, or 64 sessions all reading
data not all of which is in shared_buffers at top speed. In fact,
we've had reports that the background writer isn't too effective even
on read-write workloads. The point is - if the background writer
isn't waking up and running frequently enough, what it does when it
does wake up isn't going to matter very much. I think we need to
spend some energy poking at that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-06-27 12:23:59 Re: Developer meeting photos
Previous Message Peter Eisentraut 2013-06-27 12:16:51 Re: Min value for port