Quick Links

Re: core system is getting unresponsive because over 300 cpu load

From:	pinker <pinker(at)onet(dot)eu>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: core system is getting unresponsive because over 300 cpu load
Date:	2017-10-10 22:28:52
Message-ID:	1507674532671-0.post@n3.nabble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Tomas Vondra-4 wrote
> What is "CPU load"? Perhaps you mean "load average"?

Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the
graph from yesterday's failure (after 6p.m.):
<http://www.postgresql-archive.org/file/t342733/cpu.png>
So as one can see connections spikes follow cpu spikes...

Tomas Vondra-4 wrote
> Also, what are the basic system parameters (number of cores, RAM), it's
> difficult to help without knowing that.

I have actually written everything in the first post:
80 CPU and 4 sockets
over 500GB RAM

Tomas Vondra-4 wrote
> Well, 3M transactions over ~2h period is just ~450tps, so nothing
> extreme. Not sure how large the transactions are, of course.

It's quite a lot going on. Most of them are complicated stored procedures.

Tomas Vondra-4 wrote
> Something gets executed on the database. We have no idea what it is, but
> it should be in the system logs. And you should see the process in 'top'
> with large amounts of virtual memory ...

Yes, it would be much easier if it would be just single query from the top,
but the most cpu is eaten by the system itself and I'm not sure why. I
suppose because of page tables size and anon pages is NUMA related.

Tomas Vondra-4 wrote
> Another possibility is a run-away query that consumes a lot of work_mem.

It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
stored procedures with unnecessary WITH clauses (i.e. materialization) and
right after it IN query with results of that (hash).

Tomas Vondra-4 wrote
> Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
> and then you can decide.

Thank you for the tip. I always do it but haven't here, so the result is
0.992969610990056 - so increasing it is rather pointless.

Tomas Vondra-4 wrote
> You may also make the bgwriter more aggressive - that won't really
> improve the hit ratio, it will only make enough room for the backends.

yes i probably will

Tomas Vondra-4 wrote
> But I don't quite see how this could cause the severe problems you have,
> as I assume this is kinda regular behavior on that system. Hard to say
> without more data.

I can provide you with any data you need :)

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

In response to

Re: core system is getting unresponsive because over 300 cpu load at 2017-10-10 21:12:13 from Tomas Vondra

Responses

Re: core system is getting unresponsive because over 300 cpu load at 2017-10-10 22:41:26 from John R Pierce
Re: core system is getting unresponsive because over 300 cpu load at 2017-10-10 23:23:26 from Tomas Vondra
Re: core system is getting unresponsive because over 300 cpu load at 2017-10-12 13:25:47 from Scott Marlowe

Browse pgsql-general by date

	From	Date	Subject
Next Message	John R Pierce	2017-10-10 22:41:26	Re: core system is getting unresponsive because over 300 cpu load
Previous Message	Victor Yegorov	2017-10-10 22:20:06	Re: core system is getting unresponsive because over 300 cpu load