Re: High CPU Utilization

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Joe Uhl <joeuhl(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: High CPU Utilization
Date: 2009-03-16 22:09:13
Message-ID: dcc563d10903161509l7f7664ekd55a8177d48a89c6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Mar 16, 2009 at 2:50 PM, Joe Uhl <joeuhl(at)gmail(dot)com> wrote:
> I dropped the pool sizes and brought things back up.  Things are stable,
> site is fast, CPU utilization is still high.  Probably just a matter of time
> before issue comes back (we get slammed as kids get out of school in the
> US).

Yeah, I'm guessing your server (or more specifically its RAID card)
just aren't up to the task. We had the same problem last year with a
machine with 16 Gig ram and dual dual core 3.0GHz xeons with a Perc 5
something or other. No matter how we tuned it or played with it, we
just couldn't get good random performance out of it. It's since been
replaced by a white box unit with a tyan mobo and dual 4 core opterons
and an Areca 1680 and a 12 drive RAID-10. We can sustain 30 to 60 Megs
a second random access with 0 to 10% iowait.

Here's a typical vmstat 10 output when our load factor is hovering around 8...
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 1 460 170812 92856 29928156 0 0 604 3986 4863 10146
74 3 20 3 0
7 1 460 124160 92912 29939660 0 0 812 5701 4829 9733 70
3 23 3 0
13 0 460 211036 92984 29947636 0 0 589 3178 4429 9964 69
3 25 3 0
7 2 460 90968 93068 29963368 0 0 1067 4463 4915 11081
78 3 14 5 0
7 3 460 115216 93100 29963336 0 0 3008 3197 4032 11812
69 4 15 12 0
6 1 460 142120 93088 29923736 0 0 1112 6390 4991 11023
75 4 15 6 0
6 0 460 157896 93208 29932576 0 0 698 2196 4151 8877 71
2 23 3 0
11 0 460 124868 93296 29948824 0 0 963 3645 4891 10382
74 3 19 4 0
5 3 460 95960 93272 29918064 0 0 592 30055 5550 7430 56
3 18 23 0
9 0 460 95408 93196 29914556 0 0 1090 3522 4463 10421
71 3 21 5 0
9 0 460 128632 93176 29916412 0 0 883 4774 4757 10378
76 4 17 3 0

Note the bursty parts where we're shoving out 30Megs a second and the
wait jumps to 23%. That's about as bad as it gets during the day for
us. NBote that in your graph your bi column appears to be dominating
your bo column, so it looks like you're reaching a point where the
write cache on the controller gets full and you're real throughput is
shown to be ~ 1 megabyte a second outbound, and the inbound traffic
either has priority or is just filling in the gaps. It looks to me
like your RAID card is prioritizing reads over writes, and the whole
system is just slowing to a crawl. I'm willing to bet that if you
were running pure SW RAID with no RAID controller you'd get better
numbers.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2009-03-17 00:17:18 Re: Postgres benchmarking with pgbench
Previous Message Joe Uhl 2009-03-16 20:50:04 Re: High CPU Utilization