From: | Charles Hornberger <charlie(at)hss(dot)caltech(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: postmaster dead but backends still running? |
Date: | 2003-06-19 17:22:47 |
Message-ID: | Pine.LNX.4.53.0306191011140.3921@economex.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Tue, 17 Jun 2003, Tom Lane wrote:
> Charles Hornberger <charlie(at)hss(dot)caltech(dot)edu> writes:
> > Other things I perhaps ought to mention: Trying to stop the postmaster
> > using pg_ctl fails (unsurprisingly, since pg_ctl relies on
> > /var/pgsql/data/postmaster.pid, which contains a nonexistent PID); I
> > haven't tried to start a new postmaster yet, because the old backends
> > are hanging around.
>
> In theory a new postmaster would detect the old backends and refuse to
> start anyway. I don't trust that interlock unreservedly though. (But
> please test it while you have the opportunity...)
Unfortunately, our system administrator solved this before I got a chance
to test more. I don't know how he went about restarting the server,
although whatever he did doesn't appear to have hurt anything; would
it be interesting to know exactly what steps he took?
> > Nor have I attempted to restart the web server, which might allow the
> > hanging-round backends to die by closing the old connections it's
> > holding to them. I'm tempted to go ahead and do this, though I'm not
> > sure whether I ought to until I've diagnosed what's going on right now.
>
> You will need to close all the existing connections before the new
> postmaster can be started. I'd recommend doing so sooner instead of
> later, because with no postmaster you aren't getting any checkpoints
> done, and your WAL space is going to start ballooning.
>
> As far as diagnosing the problem goes: if you have a postmaster log
> file, look to see if the postmaster wrote an ERROR or FATAL message
> before it exited. (Finding it among all the backend-level messages
> might be painful though.) Also look in the directory the postmaster
> was started in to see if there's a core file. Save away any evidence
> you can find before trying to start a new postmaster.
Interestingly, there are no messages in the log file, and I can't find a
core file -- in short, there's no evidence whatsoever, at least not that
I can find. (Though I am probably a pretty rotten detective.)
However, I think I know the cause (though I haven't tested to see if this
indeed causes the postmaster to die): A few hours before I noticed that
the postmaster was dead, one of the sysadmins made a typo that caused an
NFS mount to become unavailable -- the very NFS mount that held the
postgres executable (all our Solaris boxes share the same executables). So
the theory is that the postmaster tried to fork() a process using a
non-existent executable, and died as a result. Does this make any sense?
-Charlie
> Because the postmaster doesn't actually do much, crashes are pretty
> unusual. I'm interested in whatever you can find.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ragnar Kjørstad | 2003-06-19 17:22:50 | Re: Database Encryption |
Previous Message | Radu-Adrian Popescu | 2003-06-19 17:20:08 | Re: IMPORTANT:migration de mysql =>postgresql |