Re: core system is getting unresponsive because over 300 cpu load

From: pinker <pinker(at)onet(dot)eu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: core system is getting unresponsive because over 300 cpu load
Date: 2017-10-11 00:26:26
Message-ID: 1507681586390-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tomas Vondra-4 wrote
> I'm probably a bit dumb (after all, it's 1AM over here), but can you
> explain the CPU chart? I'd understand percentages (say, 75% CPU used)
> but what do the seconds / fractions mean? E.g. when the system time
> reaches 5 seconds, what does that mean?

hehe, no you've just spotted a mistake, it suppose to be 50 cores :)
out of 80 in total

Tomas Vondra-4 wrote
> Have you tried profiling using perf? That usually identifies hot spots
> pretty quickly - either in PostgreSQL code or in the kernel.

I was always afraid because of overhead, but maybe it's time to start ...

Tomas Vondra-4 wrote
> What I meant is that if the system evicts this amount of buffers all the
> time (i.e. there doesn't seem to be any sudden spike), then it's
> unlikely to be the cause (or related to it).

I was actually been thinking about scenario where different sessions want to
at one time read/write from or to many different relfilenodes, what could
cause page swap between shared buffers and os cache? we see that context
switches on cpu are increasing as well. kernel documentation says that using
page tables instead of Translation Lookaside Buffer (TLB) is very costly and
on some blogs have seen recomendations that using huge pages (so more
addresses can fit in TLB) will help here but postgresql, unlike oracle,
cannot use it for anything else than page buffering (so 16gb) ... so process
memory still needs to use 4k pages.
or memory fragmentation?

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pinker 2017-10-11 00:38:47 Re: core system is getting unresponsive because over 300 cpu load
Previous Message Andres Freund 2017-10-11 00:18:06 Re: core system is getting unresponsive because over 300 cpu load