From: | Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: How to cripple a postgres server |
Date: | 2002-05-28 23:27:19 |
Message-ID: | 1022628439.25604.2.camel@chinstrap |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, 2002-05-29 at 09:08, Tom Lane wrote:
> Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au> writes:
> > I've already strace'ed the idle backend, and I can see the SIGUSR2 being
> > delivered just before everything goes bad.
>
> >> Yes, but what happens after that?
>
> > The strace stops until I manually kill the connecting process - the
> > machine stops in general until then (vmstat 1 stops producing output,
> > shells stop responding ...). So who knows what happens :(
>
> Hmm, I hadn't quite understood that you were complaining of a
> system-wide lockup and not just Postgres getting wedged. I think the
> chances are very good that this *is* a kernel bug. In any case, no
> self-respecting kernel hacker would be happy with the notion that
> a completely unprivileged user program can lock up the whole machine.
> So even if Postgres has got a problem, the kernel is clearly failing
> to defend itself adequately.
>
> Are you able to reproduce the problem with fewer than 800 backends?
> How about if you try it on a smaller machine?
Yep, on a PIII-800 with 256MB I can do it with fewer backends (I forget
how many) and only a few vacuums. It's much easier, basically, but
there's much less CPU on that machine. It also locks the machine up for
several minutes...
> Another thing that would be entertaining to try is other ways of
> releasing 800 queries at once. For example, on connection 1 do
> BEGIN; LOCK TABLE foo;
> then issue a "SELECT COUNT(*) FROM foo" on each other connection,
> and finally COMMIT on connection 1. If that creates similar misbehavior
> then I think the SI-overrun mechanism is probably not to be blamed.
>
> > ... Sometimes, the
> > SIGUSR2 does just create a very brief load spike (vmstat shows >500
> > processes on the run queue, but the next second everything is back to
> > normal and no unusual amount of CPU is consumed).
>
> That's the behavior I'd expect. We need to figure out what's different
> between that case and the cases where it locks up.
>
> regards, tom lane
Yeah. I'll try your suggestion above and report back.
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-05-29 00:04:05 | Re: Invalid length of startup packet |
Previous Message | Tom Lane | 2002-05-28 23:08:49 | Re: How to cripple a postgres server |