From: | Slavisa Garic <sgaric(at)gmail(dot)com> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, pgsql-novice(at)postgresql(dot)org |
Subject: | Re: [PERFORM] Many connections lingering |
Date: | 2005-04-13 05:09:03 |
Message-ID: | bcb55890050412220936c80013@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice pgsql-performance |
Hi Greg,
This is not a Windows server. Both server and client are the same
machine (done for testing purposes) and it is a Fedora RC2 machine.
This also happens on debian server and client in which case they were
two separate machines.
There are thousands (2+) of these waiting around and each one of them
dissapears after 50ish seconds. I tried psql command line and
monitored that connection in netstats. After I did a graceful exit
(\quit) the connection changed to TIME_WAIT and it was sitting there
for around 50 seconds. I thought I could do what you suggested with
having one connection and making each query a full BEGIN/QUERY/COMMIT
transaction but I thought I could avoid that :).
This is a serious problem for me as there are multiple users using our
software on our server and I would want to avoid having connections
open for a long time. In the scenario mentioned below I haven't
explained the magnitute of the communications happening between Agents
and DBServer. There could possibly be 100 or more Agents per
experiment, per user running on remote machines at the same time,
hence we need short transactions/pgsql connections. Agents need a
reliable connection because failure to connect could mean a loss of
computation results that were gathered over long periods of time.
Thanks for the help by the way :),
Regards,
Slavisa
On 12 Apr 2005 23:27:09 -0400, Greg Stark <gsstark(at)mit(dot)edu> wrote:
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Slavisa Garic <sgaric(at)gmail(dot)com> writes:
> > > ... Now, the
> > > interesting behaviour is this. I've ran netstat on the machine where
> > > my software is running and I searched for tcp connections to my PGSQL
> > > server. What i found was hundreds of lines like this:
> >
> > > tcp 0 0 remus.dstc.monash:43001 remus.dstc.monash:39504 TIME_WAIT
> > > tcp 0 0 remus.dstc.monash:43001 remus.dstc.monash:40720 TIME_WAIT
> > > tcp 0 0 remus.dstc.monash:43001 remus.dstc.monash:39135 TIME_WAIT
> >
> > This is a network-level issue: the TCP stack on your machine knows the
> > connection has been closed, but it hasn't seen an acknowledgement of
> > that fact from the other machine, and so it's remembering the connection
> > number so that it can definitively say "that connection is closed" if
> > the other machine asks. I'd guess that either you have a flaky network
> > or there's something bogus about the TCP stack on the client machine.
> > An occasional dropped FIN packet is no surprise, but hundreds of 'em
> > are suspicious.
>
> No, what Tom's describing is a different pair of states called FIN_WAIT_1 and
> FIN_WAIT_2. TIME_WAIT isn't waiting for a packet, just a timeout. This is to
> prevent any delayed packets from earlier in the connection causing problems
> with a subsequent good connection. Otherwise you could get data from the old
> connection mixed in the data for later ones.
>
> > > Now could someone explain to me what this really means and what effect
> > > it might have on the machine (the same machine where I ran this
> > > query)? Would there eventually be a shortage of available ports if
> > > this kept growing? The reason I am asking this is because one of my
> > > modules was raising exception saying that TCP connection could not be
> > > establish to a server it needed to connect to.
>
> What it does indicate is that each query you're making is probably not just a
> separate transaction but a separate TCP connection. That's probably not
> necessary. If you have a single long-lived process you could just keep the TCP
> connection open and issue a COMMIT after each transaction. That's what I would
> recommend doing.
>
> Unless you have thousands of these TIME_WAIT connections they probably aren't
> actually directly the cause of your failure to establish connections. But yes
> it can happen.
>
> What's more likely happening here is that you're stressing the server by
> issuing so many connection attempts that you're triggering some bug, either in
> the TCP stack or Postgres that is causing some connection attempts to not be
> handled properly.
>
> I'm skeptical that there's a bug in Postgres since lots of people do in fact
> run web servers configured to open a new connection for every page. But this
> wouldn't happen to be a Windows server would it? Perhaps the networking code
> in that port doesn't do the right thing in this case?
>
> --
> greg
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Cima | 2005-04-13 05:46:24 | error handling |
Previous Message | Tom Lane | 2005-04-13 05:01:40 | Re: [PERFORM] Many connections lingering |
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2005-04-13 08:02:56 | Re: performance hit for replication |
Previous Message | Tom Lane | 2005-04-13 05:01:40 | Re: [PERFORM] Many connections lingering |