Re: performance config help

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Bob Dusek" <redusek(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance config help
Date: 2010-01-11 18:20:11
Message-ID: 4B4B177B020000250002E194@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Bob Dusek <redusek(at)gmail(dot)com> wrote:
> Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Bob Dusek <redusek(at)gmail(dot)com> wrote:

>> Anyway, my benchmarks tend to show that best throughput occurs at
>> about (CPU_count * 2) plus effective_spindle_count. Since you
>> seem to be fully cached, effective_spindle_count would be zero,
>> so I would expect performance to start to degrade when you have
>> more than about 32 sessions active.
>>
> That's a little disheartening for a single or dual CPU system.

Not at all. You only have so many resources to keep busy at any one
moment. It is generally more efficient to only context switch
between as many processes as can keep those resources relatively
busy; otherwise valuable resources are spent switching among the
various processes rather than doing useful work.

[Regular readers of this list might want to skip ahead while I run
through my usual "thought experiment on the topic. ;-) ]

Imagine this hypothetical environment -- you have one CPU running
requests. There are no other resources to worry about and no
latency to the clients. Let's say that the requests take one second
each. The client suddenly has 100 requests to run. Assuming
context switching is free, you could submit all at once, and 100
seconds later, you get 100 responses, with an average response time
of 100 seconds. Let's put a (again free) connection pooler in
there. You submit those 100 requests, but they are fed to the
database one at a time. You get one response back in one second,
the next in two seconds, the last in 100 seconds. No request took
any longer, and the average response time was 50.5 seconds -- almost
a 50% reduction.

Now context switching is not free, and you had tens of thousands of
them per second. Besides the hit on CPU availability during each
switch, you're reducing the value of the L1 and L2 caches. So in
reality, you could expect your "request storm" to perform
significantly worse in comparison to the connection pooled
configuration. In reality, you have more than one resource to keep
busy, so the pool should be sized greater than one; but it's still
true that there is some point at which getting a request to the
database server delays the response to that request more than
queuing it for later execution would. Some database products build
in a way to manage this; in PostgreSQL it's on you to do so.

>> Your vmstat output suggests that context switches are becoming a
>> problem, and I wouldn't be surprised if I heard that the network
>> is an issue. You might want to have someone take a look at the
>> network side to check.
>>
> This is all happening on a LAN, and network throughput doesn't
> seem to be an issue. It may be a busy network, but I'm not sure
> about a problem. Can you elaborate on your suspicion, based on
> the vmstat? I haven't used vmstat much.

It was simply this: all that CPU idle time while it was swamped with
requests suggests that there might be a bottleneck outside the
database server. That could be, as another post suggests, the
client software. It could also be the network. (It could also be
contention on locks within PostgreSQL from the large number of
requests, but that's covered by the connection pooling suggestion.)

> The problem with connection pooling is that we actually have to
> achieve more than 40 per second, which happens to be the sweet
> spot with our current config.

Well, if you're considering a connection pool which can only submit
one request per second, you're looking at the wrong technology. We
use a custom connection pool built into our software, so I'm not
very familiar with the "drop in" packages out there, but we start
the next queued request based on the completion of a request --
there's no polling involved.

Between the RAID 0, fsync = off, and full_page_writes = off -- you
really had better not be staking anything important on this data.
This configuration would make The Flying Wallendas break out in a
sweat. It suggests to me that you might want to look into a better
RAID controller -- a high quality controller with battery-backup
(BBU) cache, configured for write-back, might allow you to change
all these to safe settings. If you also switch to a RAID
configuration with some redundancy, you'll be much safer....

-Kevin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bob Dusek 2010-01-11 18:57:31 Re: performance config help
Previous Message Scott Marlowe 2010-01-11 18:19:34 Re: performance config help