From: | Scott Mead <scottm(at)openscg(dot)com> |
---|---|
To: | Brian Fehrle <brianf(at)consistentstate(dot)com> |
Cc: | pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Server hitting 100% CPU usage, system comes to a crawl. |
Date: | 2011-10-27 20:27:19 |
Message-ID: | CAKq0gvKw_dvXMYme+MG=Aj8ENoPFug+u_Jvx5nXgLig-G32JRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Oct 27, 2011 at 2:39 PM, Brian Fehrle <brianf(at)consistentstate(dot)com>wrote:
> Hi all, need some help/clues on tracking down a performance issue.
>
> PostgreSQL version: 8.3.11
>
> I've got a system that has 32 cores and 128 gigs of ram. We have connection
> pooling set up, with about 100 - 200 persistent connections open to the
> database. Our applications then use these connections to query the database
> constantly, but when a connection isn't currently executing a query, it's
> <IDLE>. On average, at any given time, there are 3 - 6 connections that are
> actually executing a query, while the rest are <IDLE>.
>
Remember, when you read pg_stat_activity, it is showing you query activity
from that exact specific moment in time. Just because it looks like only
3-6 connections are executing, doesn't mean that 200 aren't actually
executing < .1ms statements. With such a beefy box, I would see if you can
examine any stats from your connection pooler to find out how many
connections are actually getting used.
>
> About once a day, queries that normally take just a few seconds slow way
> down, and start to pile up, to the point where instead of just having 3-6
> queries running at any given time, we get 100 - 200. The whole system comes
> to a crawl, and looking at top, the CPU usage is 99%.
>
> Looking at top, I see no SWAP usage, very little IOWait, and there are a
> large number of postmaster processes at 100% cpu usage (makes sense, at this
> point there are 150 or so queries currently executing on the database).
>
> Tasks: 713 total, 44 running, 668 sleeping, 0 stopped, 1 zombie
> Cpu(s): 4.4%us, 92.0%sy, 0.0%ni, 3.0%id, 0.0%wa, 0.0%hi, 0.3%si,
> 0.2%st
> Mem: 134217728k total, 131229972k used, 2987756k free, 462444k buffers
> Swap: 8388600k total, 296k used, 8388304k free, 119029580k cached
>
>
> In the past, we noticed that autovacuum was hitting some large tables at
> the same time this happened, so we turned autovacuum off to see if that was
> the issue, and it still happened without any vacuums running.
>
That was my next question :)
>
> We also ruled out checkpoints being the cause.
>
> How exactly did you rule this out? Just because a checkpoint is over
doesn't mean that it hasn't had a negative effect on the OS cache. If
you're stuck going to disk, that could be hurting you (that being said, you
do point to a low I/O wait above, so you're probably correct in ruling this
out).
>
> I'm currently digging through some statistics I've been gathering to see if
> traffic increased at all, or remained the same when the slowdown occurred.
> I'm also digging through the logs from the postgresql cluster (I increased
> verbosity yesterday), looking for any clues. Any suggestions or clues on
> where to look for this to see what can be causing a slowdown like this would
> be greatly appreciated.
>
> Are you capturing table-level stats from pg_stat_user_[tables | indexes]?
Just because a server doesn't look busy doesn't mean that you're not doing
1000 index scans per second returning 1000 tuples each time.
--Scott
> Thanks,
> - Brian F
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Schreyer | 2011-10-27 20:31:41 | Custom data type in C with one fixed and one variable attribute |
Previous Message | Brian Fehrle | 2011-10-27 20:15:25 | Re: Server hitting 100% CPU usage, system comes to a crawl. |