Re: core system is getting unresponsive because over 300 cpu load

From: Andres Freund <andres(at)anarazel(dot)de>
To: pinker <pinker(at)onet(dot)eu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: core system is getting unresponsive because over 300 cpu load
Date: 2017-10-11 00:18:06
Message-ID: 20171011001806.n7biw2lps2iq3yt7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 2017-10-10 13:40:07 -0700, pinker wrote:
> and the total number of connections are increasing very fast (but I suppose
> it's the symptom not the root cause of cpu load) and exceed max_connections
> (1000).

Others mentioned already that that's worth improving.

> System:
> * CentOS Linux release 7.2.1511 (Core)
> * Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux

Some versions of this kernel have had serious problems with transparent
hugepages. I'd try turning that off. I think it defaults to off even in
that version, but also make sure zone_reclaim_mode is disabled.

> * postgresql95-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-server-9.5.5-1PGDG.rhel7.x86_64
>
> * 4 sockets/80 cores

9.6 has quite some scalability improvements over 9.5. I don't know
whether it's feasible for you to update, but if so, It's worth trying.

How about taking perf profile to investigate?

> * vm.dirty_background_bytes = 0
> * vm.dirty_background_ratio = 2
> * vm.dirty_bytes = 0
> * vm.dirty_expire_centisecs = 3000
> * vm.dirty_ratio = 20
> * vm.dirty_writeback_centisecs = 500

I'd suggest monitoring /proc/meminfo for the amount of Dirty and
Writeback memory, and see whether rapid changes therein coincide with
periodds of slowdown.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pinker 2017-10-11 00:26:26 Re: core system is getting unresponsive because over 300 cpu load
Previous Message Tomas Vondra 2017-10-10 23:23:26 Re: core system is getting unresponsive because over 300 cpu load