From: | "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql general" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: 100% CPU pg processes that don't die. |
Date: | 2008-08-11 03:53:53 |
Message-ID: | dcc563d10808102053ja35ee8ete04e9b36eaafd10d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sat, Aug 9, 2008 at 2:54 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Sat, Aug 9, 2008 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:
>>> I'm load testing a machine, and i'm seeing idle in transaction
>>> processes that are no longer hooked to any outside client, that pull
>>> 100% CPU and can't be kill -9ed.
>>
>> To my knowledge, the only way a process can't be kill -9'd is if it's
>> stuck inside the kernel (typically, doing I/O to a nonresponsive disk).
>> There's certainly no way for a userland process to defend itself against
>> kill -9. So my immediate response would have been to look for a
>> hardware problem, or failing that a kernel bug. I see from the
>> subsequent thread that indeed hardware failure looks to be the answer,
>> but that should have been your first assumption.
>
> It was before this. That's why I'd swapped the RAID cards. It's just
> that this is the first time this has happened without killing the box,
> so I wanted to be sure it didn't look like something else to anybody.
Just as a followup several hours later the other machine started
producing the same effects. I'm gonna go trawl through the lkml to
see if they have any info on this problem.
The good news is that both Centos 5.2 and Ubuntu 7.10 seem immune to
this particular bug, and have been running 13 hours now without a
hitch.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Treat | 2008-08-11 05:40:48 | Re: mailing list/newsgroup disconnect |
Previous Message | Greg Smith | 2008-08-11 03:42:17 | Re: Response time between shared buffer cache and operating system |