Re: It happened again: Server hung up solid

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: It happened again: Server hung up solid
Date: 2000-05-08 00:35:33
Message-ID: 25204.957746133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> Okay, this is with code of ~May 4th ... a 'psql' connection to the
> database hangs solid.

Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.

> errout is dated:

> pgsql% !ls
> ls -lt
> total 13324
> -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432

> and the last few lines contain:

> ERROR: parser: parse error at or near "vpti"
> pq_recvbuf: unexpected EOF on client connection
> pq_flush: send() failed: Broken pipe
> pq_recvbuf: recv() failed: Connection reset by peer
> pq_recvbuf: unexpected EOF on client connection
> pq_recvbuf: unexpected EOF on client connection
> pq_flush: send() failed: Broken pipe
> pq_recvbuf: recv() failed: Connection reset by peer

> But, of course, no date/time ...

Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?

> Since this is a production server, I can't just leave it there hung like
> that, but if someone wants to give some instructions on what to do the
> next time this happens, please feel free to do so, and I'll add that to my
> list ... maybe run a gdb command on it, since truss doesn't appear to
> help?

Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).

> At this time, I consider this to be a show-stopper on the release ... this
> is what happened the last time when the result appeared to be the index
> corruption

If the postmaster is hanging then it's almost certainly unrelated to
index corruption...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2000-05-08 00:48:28 Re: It happened again: Server hung up solid
Previous Message Lamar Owen 2000-05-08 00:10:42 A celebrity among us :-)