From: | "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk> |
---|---|
To: | "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com> |
Cc: | shoaib <shoaibm(at)vmoksha(dot)com>, 'Martijn van Oosterhout' <kleptog(at)svana(dot)org>, gearond(at)cvc(dot)net, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Database server restarting |
Date: | 2003-05-06 18:00:34 |
Message-ID: | Pine.LNX.4.21.0305061838130.13508-100000@ponder.fairway2k.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 6 May 2003, scott.marlowe wrote:
> Here's a little script that will run top every so often and log the output
> to a file you can read later when the machine's recovered.
>
> #!/bin/bash
> for ((a=0;a=1;a=0)) do {
> top -bn 1 >>log.txt
> sleep 60
> }
>
> Just run it in your home directory. Make sure your /home partition has
> enough space. Under heavy load each 60 seconds you'll be adding about 2k
> to 5k to that file. Change the sleep 60 to something smaller if you want
> it to run more often. No warranties implied, use at your own risk. :-)
The problem with that is that it is starting up new processes each
iteration. At the least you need to redirect stderr to the log file as
well. Should top fail to launch then that would provide some help with the
problem but not as much as actually having the output of top. It would be much
better to just do a:
top -d 60 -b -n 600 > log.txt 2>&1
which would take snapshots for 10 hours, or just set a very large number
instead of 600 and interrupt it when wanted. The 60 second interval can easily
be changed then as well.
Then, of course, if the issue is disk activity, swap or otherwise, there's also
vmstat. What about file descriptor usage? It's possible to determine an
estimate of that by looking through /proc, in which case I'd say a simple shell
script would suffice and never mind the possible failures to start programs
like ls. Then what about if it's interrupt activity that's a problem? Not very
likely on modern hardware but even 10Mbps ethernet could bring a system almost
to it's knees with interrupt activity on older stuff.
I think the important point in this is that there is something making the
system unstable and the extra load produced by the postgresql cron jobs is
sufficient to make that something significant where normally a daily reboot
prevents it avoids it getting to that stage. So again, it's the question of
'why reboot daily?'
--
Nigel Andrews
From | Date | Subject | |
---|---|---|---|
Next Message | Mr Mat psql-mail | 2003-05-06 18:15:28 | Standard Solutions with Perl DBI::Pg for results tables |
Previous Message | scott.marlowe | 2003-05-06 17:42:40 | Re: Database server restarting |