Re: CentOS upgrade to 6.8 resulting in high load for PostgreSQL 9.4.6 server

From: Scott Mead <scottm(at)openscg(dot)com>
To: "Hazuka (CTR), Brad" <brad(dot)hazuka(dot)ctr(at)usgs(dot)gov>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: CentOS upgrade to 6.8 resulting in high load for PostgreSQL 9.4.6 server
Date: 2016-06-09 17:38:42
Message-ID: CAKq0gv+6qCtn+Q=BqTbwvkKBnF8rcC+vHiA6so4n7j8dP1bjvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Thu, Jun 9, 2016 at 8:27 AM, Hazuka (CTR), Brad <brad(dot)hazuka(dot)ctr(at)usgs(dot)gov
> wrote:

> I am hoping someone can help me out. I have a production server running
> CentOS 6.8 and PostgreSQL community edition 9.4.6. The application hitting
> the database is highly concurrent. When the database was running on the
> latest 6.7 CentOS OS I had no problems, we were seeing great throughput and
> my bottleneck was IO as I expected it to be. From the sar data my CPU
> overhead was ~10% and my disk usage was 99%. Memory was not using swap.
> After the SA upgraded the machine to 6.8 the profile changed. Now I am
> seeing a high CPU bound load on the system caused by postgres. disk usage
> is at ~50% and our application throughput has dropped dynamically. I was
> able to pull some vmstat and perf metrics which can be seen below. Any
> insight someone can offer would be great to return the system performance.
> Unfortunately rolling back to 6.7 is not an option because of security
> policy.
>

Any chance that you had tuned the IOScheduler or HugePages and now you've
lost those settings with an OS upgrade?

--Scott

>
> OS Info:
>
> CentOS 6.8 Linux 2.6.32-642.1.1.el6.x86_64 #1 SMP Tue May 31 21:57:07 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> DB Info:
>
> PostgreSQL 9.4.6
>
> HW Info:
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Thread(s) per core: 2
> Core(s) per socket: 4
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 44
> Model name: Intel(R) Xeon(R) CPU X5687 @ 3.60GHz
> Stepping: 2
> CPU MHz: 3600.000
> BogoMIPS: 7199.83
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 12288K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
>
> total used free shared buffers cached
> Mem: 99052244 98472232 580012 8620496 47452 95264812
> -/+ buffers/cache: 3159968 95892276
> Swap: 4194300 11060 4183240
>
> Top information
> =================
> top - 10:22:07 up 1 day, 1:13, 3 users, load average: 15.64, 17.16,
> 17.39
> Tasks: 546 total, 18 running, 528 sleeping, 0 stopped, 0 zombie
> Cpu(s): 91.7%us, 3.8%sy, 0.0%ni, 3.8%id, 0.6%wa, 0.0%hi, 0.1%si,
> 0.0%st
> Mem: 99052244k total, 98744272k used, 307972k free, 47208k buffers
> Swap: 4194300k total, 11064k used, 4183236k free, 95472720k cached
>
> PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND
>
>
>
> 21968 8814 995:06.03 86.6 1.1 20 0 R 8640m 272 1.1g 600 postgres:
> SELECT
>
> 21972 8814 994:41.35 85.9 1.1 20 0 R 8640m 272 1.1g 600 postgres:
> SELECT
>
> 20923 8814 994:53.56 85.0 1.8 20 0 R 8641m 296 1.7g 600 postgres:
> SELECT
>
> 7607 8814 16:39.25 80.1 8.0 20 0 R 8631m 268 7.6g 600 postgres:
> SELECT
>
> 21058 8814 956:59.78 79.1 3.1 20 0 R 8641m 296 2.9g 600 postgres:
> SELECT
>
> 7667 8814 17:22.53 77.1 0.9 20 0 R 8621m 288 855m 600 postgres:
> SELECT
>
> 20899 8814 995:15.39 76.5 4.4 20 0 R 8641m 292 4.2g 600 postgres:
> SELECT
>
> 7381 8814 28:59.06 75.8 0.9 20 0 R 8621m 288 865m 600 postgres:
> SELECT
>
> 21966 8814 995:13.87 74.5 4.4 20 0 R 8641m 272 4.2g 600 postgres:
> SELECT
>
> 7848 8814 10:22.82 73.2 0.9 20 0 R 8621m 288 840m 600 postgres:
> SELECT
>
> 7806 8814 13:50.43 71.9 0.9 20 0 S 8621m 288 849m 600 postgres:
> idle
>
> 7645 8814 18:08.60 70.9 0.9 20 0 R 8621m 288 852m 600 postgres:
> SELECT
>
> 7619 8814 19:34.71 70.3 0.9 20 0 R 8621m 288 856m 600 postgres:
> SELECT
>
> 7665 8814 17:40.34 69.6 0.9 20 0 S 8621m 288 855m 600 postgres:
> COMMIT
>
> 7843 8814 13:35.96 68.6 0.9 20 0 R 8621m 288 849m 600 postgres:
> SELECT
>
> 7264 8814 31:09.93 67.3 1.5 20 0 R 8633m 268 1.5g 600 postgres:
> SELECT
>
> 7675 8814 15:00.69 66.7 0.9 20 0 R 8621m 288 849m 600 postgres:
> SELECT
>
> 7574 8814 23:30.92 64.4 0.9 20 0 D 8621m 288 857m 600 postgres:
> SELECT
>
> 6977 8814 34:06.98 64.1 2.8 20 0 R 9106m 268 2.7g 600 postgres:
> SELECT
>
> 21973 8814 5:39.97 1.3 1.1 20 0 S 8640m 272 1.1g 600 postgres:
> idle
>
>
> 9212 1 13:13.12 1.0 0.0 20 0 S 44120 540 1140 600
> /opt/pgbouncer/bin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
>
>
>
>
> VMStat Information
> =================
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu-----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa st
> 20 0 11072 749604 47000 95163200 0 0 75 230 18 26 82 0
> 15 2 0
> 15 0 11072 723244 47016 95176920 0 0 38 887 86610 33495 84
> 6 10 0 0
> 21 0 11072 708256 47024 95190704 0 0 28 3856 86047 23593 85
> 5 9 1 0
> 20 0 11072 702536 47040 95196160 0 0 1 3890 82383 10971 89
> 4 6 1 0
> 19 1 11072 696464 47048 95201504 0 0 17 522 84030 11061 90
> 4 5 0 0
> 21 0 11072 688740 47064 95206920 0 0 31 2238 84497 27806 87
> 5 8 0 0
>
> perf top info
> =================
> Overhead Shared Object Symbol
> 12.83% postgres [.] heap_hot_search_buffer
> 6.82% postgres [.] ExecScanHashBucket
> 6.67% postgres [.] slot_deform_tuple
> 4.74% postgres [.] LWLockAcquire
> 3.81% postgres [.] pg_next_dst_boundary
> 3.43% postgres [.] texteq
> 2.97% postgres [.] _bt_checkkeys
> 2.79% postgres [.] j2date
> 2.76% postgres [.] LWLockRelease
> 2.69% postgres [.] slot_getattr
> 2.66% postgres [.] ExecMakeFunctionResultNoSets
> 2.48% postgres [.] DetermineTimeZoneOffsetInternal
> 2.25% postgres [.] hash_search_with_hash_value
> 2.04% postgres [.] tm2timestamp
> 2.04% postgres [.] ExecEvalParamExtern
> 1.99% postgres [.] dt2time
> 1.67% postgres [.] date2j
> 1.54% postgres [.] pgstat_end_function_usage
> 1.54% postgres [.] FunctionCall2Coll
> 1.52% postgres [.] ExecQual
> 1.48% postgres [.] timestamp2tm
> 1.43% postgres [.] index_fetch_heap
> 1.33% postgres [.] index_getnext_tid
> 1.22% postgres [.] _bt_next
> 1.18% postgres [.] ExecEvalNullTest
> 1.15% postgres [.] ExecScan
> 1.14% postgres [.] btgettuple
> 0.99% postgres [.] IndexNext
> 0.95% postgres [.] timestamp2timestamptz
> 0.94% postgres [.] ReleaseAndReadBuffer
> 0.93% postgres [.] PinBuffer
> 0.92% postgres [.] ExecEvalScalarVarFast
> 0.87% postgres [.] ExecStoreTuple
> 0.85% postgres [.] LockBuffer
> 0.82% postgres [.] heap_page_prune_opt
> 0.78% libc-2.12.so [.] __memcmp_sse4_1
> 0.74% postgres [.] toast_raw_datum_size
> 0.74% postgres [.] pg_detoast_datum_packed
> 0.72% postgres [.] ExecEvalRelabelType
> 0.68% postgres [.] CheckForSerializableConflictOut
> 0.68% postgres [.] HeapTupleSatisfiesMVCC
> 0.57% postgres [.] index_getnext
> 0.53% postgres [.] ReadBuffer_common
> 0.51% postgres [.] pgstat_init_function_usage
>
> IO information
> =====================
>
> Tme DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz
> await svctm %util
> 08:00:01 dev253-5 59.13 1.89 1564.72 26.50 2.57
> 43.43 5.90 34.90
> 08:10:01 dev253-5 58.47 2.19 1555.84 26.65 2.20
> 37.59 6.05 35.38
> 08:20:01 dev253-5 60.72 36.18 1598.49 26.92 2.34
> 38.53 5.92 35.94
> 08:30:01 dev253-5 60.21 12.68 1562.41 26.16 2.18
> 36.21 5.97 35.97
> 08:40:01 dev253-5 60.77 367.49 2624.93 49.25 2.94
> 48.33 6.09 36.98
> 08:50:01 dev253-5 63.26 217.68 2644.81 45.25 5.41
> 85.45 5.82 36.80
> 09:00:01 dev253-5 60.33 1.41 1612.09 26.75 2.16
> 35.80 5.93 35.77
> 09:10:01 dev253-5 58.45 2.96 1586.15 27.19 3.29
> 56.23 6.12 35.77
> 09:20:01 dev253-5 89.26 1139.63 3323.51 50.00 3.78
> 42.35 4.51 40.28
> 09:30:01 dev253-5 92.84 1320.94 4093.75 58.32 4.21
> 45.30 4.15 38.57
> 09:40:01 dev253-5 82.81 998.29 4179.65 62.53 5.09
> 61.35 4.69 38.85
> 09:50:01 dev253-5 58.08 22.32 5549.46 95.93 5.22
> 89.88 6.14 35.67
> 10:00:01 dev253-5 60.98 50.19 10181.89 167.78 8.34
> 136.79 6.04 36.81
> 10:10:01 dev253-5 61.11 256.26 2992.49 53.16 3.12
> 51.10 5.61 34.29
> Average: dev253-5 83.02 4394.40 2225.11 79.73 3.18
> 38.35 4.69 38.93
>
>
> Bradley J. Hazuka
> Database Administrator
> Innovate!, Inc.
>

--
--
Scott Mead
Sr. Architect
*OpenSCG <http://openscg.com>*
http://openscg.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ray Stell 2016-06-09 19:41:47 Re: Server problem
Previous Message Scott Whitney 2016-06-09 17:13:32 Re: Server problem