From: | Joe Uhl <joeuhl(at)gmail(dot)com> |
---|---|
To: | Greg Smith <gsmith(at)gregsmith(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: High CPU Utilization |
Date: | 2009-03-16 20:50:04 |
Message-ID: | F161EF4F-4B72-47AC-A894-C1C1BE7B7E1A@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I dropped the pool sizes and brought things back up. Things are
stable, site is fast, CPU utilization is still high. Probably just a
matter of time before issue comes back (we get slammed as kids get out
of school in the US).
Now when I run vmtstat 1 30 it looks very different (below). Waiting
is minimal, user is very high. Under nontrivial load, according to
xact_commit in pg_stat_database we are doing 1800+ tps.
Appreciate the input and explanation on vmstat. I am going to throw
some of these numbers into zabbix so I can keep a better eye on them.
This server is a couple years old so the purchase of a new controller
and/or disks is not out of the question.
On final note, have made several changes to postgresql.conf. Some of
those here:
max_connections = 1000
shared_buffers = 7680MB
work_mem = 30MB
synchronous_commit = off
checkpoint_segments = 50
effective_cache_size = 20000MB
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
9 8 73036 500164 82200 23497748 3 4 669 541 1 1
23 3 54 19
20 4 73036 497252 82200 23500836 0 0 2500 680 11145 15168
91 4 2 2
21 1 73036 491416 82204 23503832 0 0 1916 920 10303 14032
94 4 1 1
23 5 73036 489580 82212 23505860 0 0 1348 3296 11682 15970
94 5 1 0
31 1 73036 481408 82220 23507752 0 0 984 8988 10123 11289
97 3 0 0
25 4 73036 483248 82232 23509420 0 0 1268 1312 10705 14063
96 4 0 0
23 4 73036 480096 82232 23512380 0 0 2372 472 9805 13996
94 5 1 1
24 4 73036 476732 82236 23515196 0 0 2012 720 10365 14307
96 3 1 0
22 1 73036 474468 82236 23516584 0 0 944 3108 9838 12831
95 4 1 0
14 1 73036 455756 82284 23534548 0 0 908 3284 9096 11333
94 4 1 0
10 2 73036 455224 82292 23536304 0 0 1760 416 12454 17736
89 6 3 2
17 0 73036 460620 82292 23538888 0 0 1292 968 12030 18333
90 7 2 1
13 4 73036 459764 82292 23539724 0 0 332 288 9722 14197
92 5 2 1
17 5 73036 457516 82292 23542176 0 0 1872 17752 10458 15465
91 5 2 1
19 4 73036 450804 82300 23545640 0 0 2980 640 10602 15621
90 6 2 2
24 0 73036 447660 82312 23547644 0 0 1736 10724 12401 15413
93 6 1 0
20 6 73036 444380 82320 23550692 0 0 2064 476 9008 10985
94 4 1 0
22 2 73036 442880 82328 23553640 0 0 2496 3156 10739 15211
93 5 1 1
11 1 73036 441448 82328 23555632 0 0 1452 3552 10812 15337
93 5 2 1
6 2 73036 439812 82348 23557420 0 0 1052 1128 8603 10514
91 3 3 2
6 3 73036 433456 82348 23560860 0 0 2484 656 7636 13033
68 4 14 14
6 3 73036 433084 82348 23562628 0 0 1400 408 6046 11778
70 3 18 9
5 0 73036 430776 82356 23564264 0 0 1108 1300 7549 13754
73 4 16 7
5 2 73036 430124 82360 23565580 0 0 1016 2216 7844 14507
72 4 18 6
4 2 73036 429652 82380 23567480 0 0 1168 2468 7694 15466
58 4 24 14
6 2 73036 427304 82384 23569668 0 0 1132 752 5993 13606
49 5 36 10
7 1 73036 423020 82384 23571932 0 0 1244 824 8085 18072
56 3 30 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
4 0 73036 420816 82392 23573824 0 0 1292 820 5370 10958
46 2 41 10
9 1 73020 418048 82392 23576900 52 0 1632 2592 5931 11629
60 3 29 8
4 2 73004 415164 82424 23578620 56 0 1812 4116 7503 14674
71 3 15 12
On Mar 16, 2009, at 4:19 PM, Dave Youatt wrote:
> Last column "wa" is % cpu time spent waiting (for IO to complete).
> 80s
> and 90s is pretty high, probably too high.
>
> Might also want to measure the IO/s performance of your RAID
> controller. From the descriptions, it will be much more important
> that
> long sequential reads/writes for characterizing your workload.
>
> There are also some disappointing HW RAID controllers out there.
> Generally, Aretec and Promise are good, Adaptec good, depending on
> model, and the ones that Dell ship w/their servers haven't had good
> reviews/reports.
>
>
> On 03/16/2009 01:10 PM, Joe Uhl wrote:
>> Here is vmstat 1 30. We are under peak load right now so I can
>> gather
>> information from the real deal :)
>>
>> Had an almost complete lockup a moment ago, number of non-idle
>> postgres connections was 637. Going to drop our JDBC pool sizes a
>> bit
>> and bounce everything.
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>> r b swpd free buff cache si so bi bo in cs us
>> sy
>> id wa
>> 12 35 95056 11102380 56856 14954948 3 4 669 541 1 2
>> 23 3 54 19
>> 12 39 95056 11092484 56876 14963204 0 0 6740 1204 10066
>> 13277 91 5 0 4
>> 8 42 95056 11081712 56888 14972244 0 0 8620 1168 10659
>> 17020
>> 78 6 0 15
>> 10 30 95052 11069768 56904 14982628 0 0 8944 976 9809
>> 15109
>> 81 6 1 12
>> 4 27 95048 11059576 56916 14991296 0 0 8852 440 7652 13294
>> 63 4 2 32
>> 5 42 95048 11052524 56932 14996496 0 0 4700 384 6383 11249
>> 64 4 4 28
>> 5 33 95048 11047492 56956 15001428 0 0 3852 572 6029 14010
>> 36 4 5 56
>> 7 35 95048 11041184 56960 15005480 0 0 3964 136 5042 10802
>> 40 3 1 56
>> 1 33 95048 11037988 56968 15009240 0 0 3892 168 3384 6479
>> 26 1 3 69
>> 3 28 95048 11029332 56980 15015744 0 0 6724 152 4964 12844
>> 11 2 8 79
>> 0 34 95048 11025880 56988 15020168 0 0 3852 160 3616 8614
>> 11 1 6 82
>> 3 25 95048 10996356 57044 15044796 0 0 7892 456 3126 7115
>> 4 3 8 85
>> 1 26 95048 10991692 57052 15050100 0 0 5188 176 2566 5976
>> 3 2 12 83
>> 0 29 95048 10985408 57060 15054968 0 0 4200 80 2586 6582
>> 4 1 12 83
>> 1 29 95048 10980828 57064 15058992 0 0 4560 64 2966 7557
>> 7 2 6 85
>> 2 28 95048 10977192 57072 15063176 0 0 3860 72 2695 6742
>> 11 1 7 81
>> 2 29 95048 10969120 57088 15067808 0 0 5084 84 3296 8067
>> 14 1 0 84
>> 0 25 95048 10962096 57104 15072984 0 0 4440 500 2721 6263
>> 12 1 6 80
>> 0 23 95044 10955320 57108 15079260 0 0 5712 232 2678 5990
>> 6 1 6 87
>> 2 25 95044 10948644 57120 15084524 0 0 5120 184 3499 8143
>> 20 3 9 69
>> 3 21 95044 10939744 57128 15090644 0 0 5756 264 4724 10272
>> 32 3 5 60
>> 1 19 95040 10933196 57144 15095024 12 0 4440 180 2585 5244
>> 13 2 15 70
>> 0 21 95040 10927596 57148 15098684 0 0 3248 136 2973 7292
>> 8 1 9 81
>> 1 20 95040 10920708 57164 15104244 0 0 5192 360 1865 4547
>> 3 1 9 87
>> 1 24 95040 10914552 57172 15105856 0 0 2308 16 1948 4450
>> 6 1 1 93
>> 0 24 95036 10909148 57176 15110240 0 0 3824 152 1330 2632
>> 3 1 6 90
>> 1 21 95036 10900628 57192 15116332 0 0 5680 180 1898 3986
>> 4 1 11 84
>> 0 19 95036 10888356 57200 15121736 0 0 5952 120 2252 3991
>> 12 1 8 79
>> 2 22 95036 10874336 57204 15128252 0 0 6320 112 2831 6755
>> 5 2 8 85
>> 3 26 95036 10857592 57220 15134020 0 0 5124 216 3067 5296
>> 32 6 3 59
>>
>> Alan, my apologies if you get this twice. Didn't reply back to the
>> list on first try.
>>
>> On Mar 16, 2009, at 3:52 PM, Alan Hodgson wrote:
>>
>>> On Monday 16 March 2009, Joe Uhl <joeuhl(at)gmail(dot)com> wrote:
>>>> Right now (not under peak load) this server is running at 68% CPU
>>>> utilization and its SATA raid 10 is doing about 2MB/s writes and
>>>> 11MB/
>>>> s reads. When I run dd I can hit 200+MB/s writes and 230+ MB/s
>>>> reads,
>>>> so we are barely using the available IO. Further when I run dd the
>>>> CPU utilization of that process only approaches 20%-30% of one
>>>> core.
>>>
>>> What does vmstat say when it's slow? The output of "vmstat 1 30"
>>> would be
>>> informative.
>>>
>>> note: dd is sequential I/O. Normal database usage is random I/O.
>>>
>>> --
>>> Even a sixth-grader can figure out that you can’t borrow money to
>>> pay
>>> off
>>> your debt
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
>
On Mar 16, 2009, at 4:35 PM, Greg Smith wrote:
> On Mon, 16 Mar 2009, Joe Uhl wrote:
>
>> Here is vmstat 1 30. We are under peak load right now so I can
>> gather information from the real deal
>
> Quite helpful, reformatting a bit and picking an informative section:
>
> procs -----------memory---------- ---swap- ----io--- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us
> sy id wa
> 0 34 95048 11025880 56988 15020168 0 0 3852 160 3616 8614
> 11 1 6 82
> 3 25 95048 10996356 57044 15044796 0 0 7892 456 3126 7115
> 4 3 8 85
> 1 26 95048 10991692 57052 15050100 0 0 5188 176 2566 5976
> 3 2 12 83
>
> This says that your server is spending all its time waiting for I/O,
> actual CPU utilization is minimal. You're only achieving around
> 3-8MB/s of random I/O. That's the reality of what your disk I/O
> subsystem is capable of, regardless of what its sequential
> performance with dd looks like. If you were to run a more
> complicated benchmark like bonnie++ instead, I'd bet that your
> "seeks/second" results are very low, even though sequential read/
> write is fine.
>
> The Perc5 controllers have a pretty bad reputation for performance
> on this list, even in RAID10. Not much you can do about that beyond
> scrapping the controller and getting a better one.
>
> What you might do in order to reduce the total number of writes
> needed is some standard postgresql.conf tuning; see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
> What you could do here is increase shared_buffers,
> checkpoint_segments, and checkpoint_completion_target as described
> there. Having more buffers dedicated to the database and having
> less checkpoints can result in less random I/O showing up, as
> popular data pages will stay in RAM for longer without getting
> written out so much.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com
> Baltimore, MD
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2009-03-16 22:09:13 | Re: High CPU Utilization |
Previous Message | Greg Smith | 2009-03-16 20:35:27 | Re: High CPU Utilization |