Re: It happened again: Server hung up solid

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: It happened again: Server hung up solid
Date: 2000-05-08 00:57:20
Message-ID: Pine.BSF.4.21.0005072157060.87721-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

kill -ABRT does nothing:

pgsql% kill -ABRT 33683
pgsql% !ps
ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas
pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh)
pgsql 34696 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux
pgsql% !ps
ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas
pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh)
pgsql 34697 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux

On Sun, 7 May 2000, The Hermit Hacker wrote:

>
>
> Okay, just happened again ... no postgres backend is being started:
>
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
> pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh)
> pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432
> pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh)
> pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch
> pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux
>
> Going to look at the connection tracing option now and see what I can come
> up with ...
>
>
> On Sun, 7 May 2000, Tom Lane wrote:
>
> > The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> > > Okay, this is with code of ~May 4th ... a 'psql' connection to the
> > > database hangs solid.
> >
> > Do you mean you can't make a connection at all? Is there any indication
> > that the postmaster is lighting off a backend for you? Since you show
> > a couple of zombie backends hanging around, it would seem like a good
> > bet that the postmaster itself is wedged and not responding to events,
> > but I'm not sure.
> >
> > > errout is dated:
> >
> > > pgsql% !ls
> > > ls -lt
> > > total 13324
> > > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432
> >
> > > and the last few lines contain:
> >
> > > ERROR: parser: parse error at or near "vpti"
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_flush: send() failed: Broken pipe
> > > pq_recvbuf: recv() failed: Connection reset by peer
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_flush: send() failed: Broken pipe
> > > pq_recvbuf: recv() failed: Connection reset by peer
> >
> > > But, of course, no date/time ...
> >
> > Given that the file mod time is considerably before the hang (right?)
> > the messages in it are probably unrelated. It does seem odd that you
> > have so many clients disconnecting ungracefully; what client apps are
> > you running?
> >
> > > Since this is a production server, I can't just leave it there hung like
> > > that, but if someone wants to give some instructions on what to do the
> > > next time this happens, please feel free to do so, and I'll add that to my
> > > list ... maybe run a gdb command on it, since truss doesn't appear to
> > > help?
> >
> > Try killing the postmaster itself in such a way as to produce a coredump
> > (kill -ABORT ought to do) and get a backtrace from that. It might also
> > be worth running the postmaster with connection tracing turned on (I
> > forget the incantation for that, but it should be in TFM).
> >
> > > At this time, I consider this to be a show-stopper on the release ... this
> > > is what happened the last time when the result appeared to be the index
> > > corruption
> >
> > If the postmaster is hanging then it's almost certainly unrelated to
> > index corruption...
> >
> > regards, tom lane
> >
>
> Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
> Systems Administrator @ hub.org
> primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org
>

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vince Vielhaber 2000-05-08 00:57:42 Re: It happened again: Server hung up solid
Previous Message The Hermit Hacker 2000-05-08 00:55:30 Documentation on postgres/postmaster ...