From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Dave Owens <dave(at)teamunify(dot)com> |
Cc: | Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>, Tony Kay <tony(at)teamunify(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: CPU spikes and transactions |
Date: | 2014-05-14 13:48:22 |
Message-ID: | CAHyXU0yXYpCXN4=81ZDRQu-oGzrcq2qNAXDpyz4oiQPPAGk4ew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, May 13, 2014 at 6:04 PM, Dave Owens <dave(at)teamunify(dot)com> wrote:
> Hi,
>
> Apologies for resurrecting this old thread, but it seems like this is
> better than starting a new conversation.
>
> We are now running 9.1.13 and have doubled the CPU and memory. So 2x 16
> Opteron 6276 (32 cores total), and 64GB memory. shared_buffers set to 20G,
> effective_cache_size set to 40GB.
>
> We were able to record perf data during the latest incident of high CPU
> utilization. perf report is below:
>
> Samples: 31M of event 'cycles', Event count (approx.): 16289978380877
> 44.74% postmaster [kernel.kallsyms] [k]
> _spin_lock_irqsave
> 15.03% postmaster postgres [.]
> 0x00000000002ea937
> 3.14% postmaster postgres [.] s_lock
>
> 2.30% postmaster [kernel.kallsyms] [k]
> compaction_alloc
> 2.21% postmaster postgres [.]
> HeapTupleSatisfiesMVCC
> 1.75% postmaster postgres [.]
> hash_search_with_hash_value
> 1.25% postmaster postgres [.]
> ExecScanHashBucket
> 1.20% postmaster postgres [.] SHMQueueNext
>
> 1.05% postmaster postgres [.] slot_getattr
>
> 1.04% init [kernel.kallsyms] [k]
> native_safe_halt
> 0.73% postmaster postgres [.] LWLockAcquire
>
> 0.59% postmaster [kernel.kallsyms] [k] page_fault
>
> 0.52% postmaster postgres [.] ExecQual
>
> 0.40% postmaster postgres [.] ExecStoreTuple
>
> 0.38% postmaster postgres [.] ExecScan
>
> 0.37% postmaster postgres [.]
> check_stack_depth
> 0.35% postmaster postgres [.] SearchCatCache
>
> 0.35% postmaster postgres [.]
> CheckForSerializableConflictOut
> 0.34% postmaster postgres [.] LWLockRelease
>
> 0.30% postmaster postgres [.] _bt_checkkeys
>
> 0.28% postmaster libc-2.12.so [.] memcpy
>
> 0.27% postmaster [kernel.kallsyms] [k]
> get_pageblock_flags_group
> 0.27% postmaster postgres [.] int4eq
>
> 0.27% postmaster postgres [.]
> heap_page_prune_opt
> 0.27% postmaster postgres [.]
> pgstat_init_function_usage
> 0.26% postmaster [kernel.kallsyms] [k] _spin_lock
>
> 0.25% postmaster postgres [.] _bt_compare
>
> 0.24% postmaster postgres [.]
> pgstat_end_function_usage
>
> ...please let me know if we need to produce the report differently to be
> useful.
>
> We will begin reducing shared_buffers incrementally over the coming days.
>
This is definitely pointing at THP compaction which is increasingly
emerging as a possible culprit for suddenly occurring (and just as suddenly
resolving) cpu spikes. The evidence I see is:
*) Lots of time in kernel
*) "compaction_alloc"
*) otherwise normal postgres profile (not lots of time in s_lock, LWLock,
or other weird things)
Please check the value of THP (see here:
http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadaoop-workloads/)
and various other workloads. If it is enabled consider disabling
it...this will revert to pre linux 6 behavior. If you are going to attack
this from the point of view of lowering shared buffers, do not bother with
incremental...head straight for 2GB or it's unlikely the problem will be
fixed. THP compaction is not a postgres problem...mysql is affected as is
other server platforms. If THP is indeed causing the problem, it couldn't
hurt to get on the horn withe linux guys. Last I heard they claimed this
kind of thing was fixed but I don't know where things stand now.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2014-05-14 16:28:30 | Re: CPU spikes and transactions |
Previous Message | Dave Owens | 2014-05-13 23:04:50 | Re: CPU spikes and transactions |