Quick Links

Re: Two identical systems, radically different performance

From:	Evgeny Shishkin <itparanoia(at)gmail(dot)com>
To:	Craig James <cjames(at)emolecules(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Two identical systems, radically different performance
Date:	2012-10-08 22:33:56
Message-ID:	B5EA62D3-A446-4A1F-BC4E-4B8290FEE1C2@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Oct 9, 2012, at 1:45 AM, Craig James <cjames(at)emolecules(dot)com> wrote:

> This is driving me crazy. A new server, virtually identical to an old one, has 50% of the performance with pgbench. I've checked everything I can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load while this test was running, so in theory it should be slower, not faster, than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119

On new server postgresql do not scale at all. Looks like contention.

>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++. The xlog disks were almost identical in performance. The RAID10 pg-data disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms 112ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 43291us 857us 519us 1588us 37us 178us
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms 215ms
> Version 1.96 ------Sequential Create------ --------Random Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> Latency 487us 627us 407us 972us 29us 262us
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0 1 96 3
>

Also note the difference in free/cache distribution. Unless you took these numbers in completely different stages of bonnie++.

> Any ideas where to look next would be greatly appreciated.
>
> Craig
>

In response to

Two identical systems, radically different performance at 2012-10-08 21:45:18 from Craig James

Responses

Re: Two identical systems, radically different performance at 2012-10-08 22:42:33 from Craig James
Re: Two identical systems, radically different performance at 2012-10-08 23:24:02 from Tomas Vondra

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig James	2012-10-08 22:42:33	Re: Two identical systems, radically different performance
Previous Message	Craig James	2012-10-08 22:29:17	Re: Two identical systems, radically different performance