Re: Database server restarting

From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: shoaib <shoaibm(at)vmoksha(dot)com>
Cc: 'Martijn van Oosterhout' <kleptog(at)svana(dot)org>, gearond(at)cvc(dot)net, pgsql-general(at)postgresql(dot)org
Subject: Re: Database server restarting
Date: 2003-05-06 07:44:05
Message-ID: Pine.LNX.4.21.0305060832030.10245-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 6 May 2003, shoaib wrote:

> There are some cron jobs running at the same time...
> One server does SSH into our application server and on cron job is
> reading the DB and writing some data into flat files. But by the time
> this problem is happening these jobs are not writing any data. Last
> night when the server went down the other server wa trying to do SsH and
> probably it was running some cron job and a heavy DB process was
> running.I can not do a top bcoz I can not login into server even from
> console.

Do you mean you have no log in priviledges on to the machine or you are only
trying to login once you see a problem? If the former then I can't see how
there's any way you can make progress with this. If the later, forget that,
that's not helping since you are unable to get the processes running. What you
should do is log in _now_, run 'top' and leave it running. It may be that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop and
when you come into the office n hours later you find it merrily ticking away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.

Also take a look at procinfo, it may be helpful as well.

One thing that might be a problem is the number of open file descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.

I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your particular
hardware, such as the RAID device?

One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware problem
such as the RAID or you have a bug in your own software. Perhaps an endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu usage
process shouldn't bring the machine down but it can make it look very
unresponsive.

>
> Regards
> shaoib
>
>
> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog(at)svana(dot)org]
> Sent: Tuesday, May 06, 2003 2:40 PM
> To: shoaib
> Cc: gearond(at)cvc(dot)net; 'Nigel J. Andrews'; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Database server restarting
>
> On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> > When I say hangs it means ..I am not even able to login at the server
> > console also.
> > No ssh, no login form remote machines.
>
> Well, that's not postgresql's fault. It can't hang a machine like that.
> You
> should look elsewhere for the exact cause. I'm assuming here that
> consoles
> that are still logged in don't respond either? Maybe leave a top running
> to
> capture the list of processes just before it dies? Any cronjobs about
> the
> time it dies?
>
> What other processes run at about that time?
>

--
Nigel J. Andrews

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Manfred Koizar 2003-05-06 08:01:40 Re: how to restrict inner results in OUTER JOIN?
Previous Message Shridhar Daithankar 2003-05-06 07:12:14 Re: Database server restarting