From: | Bob Dusek <redusek(at)gmail(dot)com> |
---|---|
To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: performance config help |
Date: | 2010-01-11 17:49:39 |
Message-ID: | 61039b861001110949n699f0bd1pf3c878fa4c739891@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>
>
> > This is to be expected, to some extent, as we would expect some
> perfromance
> > degradation with higher utilization. But, the hardware doesn't appear to
> be
> > very busy, and that's where we're hoping for some help.
>
> It's likely in io wait.
>
> >> What do the following commands tell you?
> >>
> >> iostat -x 10 (first iteration doesn't count)
> >
> > Here's some iostat output (from the 3rd data point from iostat -x 10)...
> > this was taken while we were processing 256 simultaneous requests.
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 34.29 0.00 7.09 0.03 0.00 58.58
> >
> > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> > avgqu-sz await svctm %util
> > sda 0.00 112.20 0.00 133.40 0.00 1964.80 14.73
> > 0.42 3.17 0.04 0.48
> > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > sda5 0.00 112.20 0.00 133.40 0.00 1964.80 14.73
> > 0.42 3.17 0.04 0.48
> > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > dm-0 0.00 0.00 0.00 0.40 0.00 3.20 8.00
> > 0.00 0.00 0.00 0.00
> > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> > dm-2 0.00 0.00 0.00 99.90 0.00 799.20 8.00
> > 0.15 1.50 0.01 0.12
> > dm-3 0.00 0.00 0.00 0.60 0.00 4.80 8.00
> > 0.00 0.33 0.17 0.01
> > dm-4 0.00 0.00 0.00 144.70 0.00 1157.60 8.00
> > 0.46 3.17 0.02 0.35
> > dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00
> >
> > The iowait seems pretty low, doesn't it?
>
> Depends, is that the first iteration of output? if so, ignore it and
> show me the second and further on. Same for vmstat... In fact let
> them run for a minute or two and attach the results... OTOH, if that
> is the second or later set of output, then you're definitely not IO
> bound, and I don't see why the CPUs are not being better utilized.
>
> I was probably not clear... the output I pasted was from the third
iteration of output from iostat. And, the vmstat output I pasted was from
the sixth iteration of output
How many concurrent queries are you running when you take these
> measurements? Can you take them with lower and higher numbers of
> concurrent users and compare the two? normally I'd be looking for
> context switching taking more and more time in a heavily loaded
> system, but I'm not seeing it in your vmstat numbers either.
>
> We took those measurements with 256 concurrent requests being processed.
So, at most, we have 256 concurrent queries executed by our application.
There aren't other applications using the db in our tests.
We can take some measurements at 40 concurrent requests and see where we
stand.
> What are your settings for
>
> effective_cache_size
>
effective_cache_size = 24GB
> random_page_cost
>
Using the default...
#random_page_cost = 4.0
> work_mem
>
work_mem = 64MB
> with your machine and the extra memory, you can probably uptune the
> work_mem to 8 Megs safely if it's at the default of 1MB. With a
> database that fits in RAM, you can often turn down random_page_cost to
> near 1.0 (1.2 to 1.4 is common for such scenarios.) And effective
> cache size being larger (in the 20G range) will hint the planner that
> it's likely to find everything it needs in ram somewhere and not on
> the disk.
>
So, we should probably try cranking our random_page_cost value down. When
we dump our db with "pg_dump --format=t", it's about 15 MB. We should be
able to keep the thing in memory.
>
> There are several common bottlenecks you can try to tune away from.
> IO doesn't look like a problem for you. Neither does CPU load. So,
> then we're left with context switching time and memory to CPU
> bandwidth. If your CPUs are basically becoming data pumps then the
> speed of your FSB becomes VERY critical, and some older Intel mobos
> didn't have a lot of CPU to Memory bandwidth and adding CPUs made it
> worse, not better. More modern Intel chipsets have much faster CPU to
> Memory BW, since they're using the same kind of fabric switching that
> AMD uses on highly parallel machines.
>
Each CPU is 2.13 GHz, with 8MB Cache, and the FSB is 1066 MHz. Does that
bus speed seem slow?
It's hard to go to the money tree and say "we're only using about half of
your CPUs, but you need to get better ones."
> If your limit is your hardware, then the only solution is a faster
> machine. It may well be that a machine with dual fast Nehalem
> (2.4GHz+) quad core CPUs will be faster. Or 4 or 8 AMD CPUs with
> their faster fabric.
>
It sounds like we could spend less money on memory and more on faster hard
drives and faster CPUs.
But, man, that's a tough sell. This box is a giant, relative to anything
else we've worked with.
From | Date | Subject | |
---|---|---|---|
Next Message | Bob Dusek | 2010-01-11 17:54:48 | Re: performance config help |
Previous Message | Scott Marlowe | 2010-01-11 17:47:10 | Re: performance config help |