| From: | Greg Smith <gsmith(at)gregsmith(dot)com> | 
|---|---|
| To: | Jason Long <mailing(dot)list(at)supernovasoftware(dot)com> | 
| Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: Debugging infrequent pegged out CPU usage | 
| Date: | 2008-11-05 21:04:44 | 
| Message-ID: | Pine.GSO.4.64.0811051548060.2523@westnet.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On Mon, 3 Nov 2008, Jason Long wrote:
> For some reason Postgres is pegging my CPU and I can barely log on to reboot 
> the machine.
Take a look at pg_stat_activity when this happens to see what's going on. 
Also, try running "top -c" to see what is going on (the -c displays extra 
information for the postgresql processes) and save a snapshot of what you 
see.
I normally put a quick script on the server to collect everything I want 
before even thinking of a restart when I get into this sort of common and 
ugly situation.  Here's a sample; run this next time and you'll be way 
ahead of where you are now at figuring out what's going wrong:
#!/bin/bash
date >> crash
psql -c "select * from pg_stat_activity" >> crash
top -c -b -n 1 >> crash
vmstat 1 5 >> crash
iostat 1 5 >> crash
psql -c "select * from pg_stat_activity" >> crash
date >> crash
> Can I limit Postgres to a certain number of cores or set the timeout on the 
> queries to a lower value?
You should at a minimum set log_min_duration_statement to figure out what 
the statements taking so long are.  At the root of this problem there's 
probably some bad queries.  I'd bet you start seeing an increase of those 
reported in the logs in advance of when the server becomes completely 
unresponsive.
> While there are relatively live few users the data is extremely 
> important and the users will not wait for me to see what is wrong. 
> They demand immediate resolution and the best I can do is reboot.
Well, if you don't have any idea how to fix the problem, that's reasonable 
I guess (although stopping just the postgresql process with "pg_ctl" is 
probably all that's really needed, a full reboot is just prolonging the 
downtime).  Once you've gotten some ideas for what to look at, like the 
little script above, you have to make the users wait until you're done 
running that before giving into pressure to fix something.  Otherwise 
you'll never solve the problem.
--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sam Mason | 2008-11-05 21:07:51 | Re: How to use index in WHERE int = float | 
| Previous Message | Alvaro Herrera | 2008-11-05 20:29:41 | Re: Speeding up query |