Re: How to cripple a postgres server

From: Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to cripple a postgres server
Date: 2002-05-28 03:15:45
Message-ID: 1022555745.3344.23.camel@ws12
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2002-05-28 at 12:44, Tom Lane wrote:
> Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au> writes:
> >> If you're seeing load peaks in excess of what would be observed with
> >> 800 active queries, then I would agree there's something to investigate
> >> here.
>
> > Yep, indeed with 800 backends doing a query every second, nothing
> > happens.
>
> > The machine itself has 1GB of RAM, and uses no swap in the above
> > situation. Instead, system time goes to 99% of CPU. The machine is a
> > dual-CPU athlon 1900 (1.5GHz).
>
> > It _only_ happens with idle connections!
>
> Hmm, you mean if the 800 other connections are *not* idle, you can
> do VACUUMs with impunity? If so, I'd agree we got a bug ...
>
> regards, tom lane

Yes, that's what I'm saying.

If you put a sleep 5 into my while loop above, the problem still happens
(albeit after a longer time).

If you put a delay in, and also make sure that each of the 800
connections does a query (I'm using "select 1;"), then the problem never
happens.

Without the delay, you get a bit of load, but only up to about 10 or so,
rather than 600.

What seems to happen is that all the idle backends get woken up every
now and then to do _something_ (it seemed to coincide with getting the
SIGUSR2 to trigger an async_notify), and that shoots up the load. Making
sure all the backends have done something every now and then seems to
avoid the problem.

I suspect the comment near the async_notify code explains the problem -
where it talks about idle backends having to be woken up to clear out
pending events...

We only notice it on our production system when it's got lots of idle db
connections.

Other people seem to have spotted it (I found references to similar
effects with Google) but nobody seems to have worked out what it is -
the other people typically had the problem with web servers which keep a
pool of db connections.

Just to reiterate; this machine never swaps, it can handle > 1000
queries per second and it doesn't seem to run out of resources...

Stephen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-05-28 03:21:44 Re: How to cripple a postgres server
Previous Message Francisco Reyes 2002-05-28 03:07:47 FreeBSD vs Red Hat. Will give feedback soon (Re: Moving data from FreeBSD to Red Hat)