Re: Postgres Disconnection problems

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Otto Blomqvist <o(dot)blomqvist(at)secomintl(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres Disconnection problems
Date: 2005-11-21 14:58:36
Message-ID: 1132585116.28788.4.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2005-11-18 at 18:51, Otto Blomqvist wrote:
> Hi,
>
> We are using PostgresDAC 2.2.1 and PostgreSQL 8.0.2 on
> i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 4.0.0
> 20050412 (Red Hat 4.0.0-0.42).
>
> I perform a simple test as follows.
>
> 1. I connect to the database, which is located on a LAN.
>
> 2. I simulate Internet problems by unplugging the Ethernet cable of the
> client. There is no PSQL activity going on.
>
> 3. Plug the ethernet cable back in
>
> 4. Run some sql, which gives me a Postgres SQL error -1, Server closed
> connection unexpectedly
>
> So far so good. Problem is that the postmaster does not detect this
> connection as dead and keeps it idle for an unknown amount of time. This is
> a real problem for us because we use persistent connections to authorize
> access to a custom built 68030 based system, which has a limited number of
> "slots" that we can use. By not releasing a dead connection we are also
> holding that 68030 slot busy.

The real issue here is that TCP keepalive keeps the connection alive for
a long time. The default on linux boxen is 2+hours. In our production
environment, we dropped tcp_keepalive to 5 minutes. There are four
settings in the linux kernel:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 500

the keepalive_time tells the kernel how long to wait to "ping" a
connection after it's gone quiet. The probes and intvl tell it how many
times to try and re-awaken it and how long to wait between each. So,
with the settings shown above, a dead connection will wait 8.3 minutes,
then execute a "ping" (not really a ping, it's on a lower level than a
real ping would be) and then will wait 1.25 minutes and do it 9 times.
So, in this scenario, an idle connection left from a network failure
will take just under 20 minutes to clear.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Kratz 2005-11-21 15:50:45 Rule appears not to fire on insert w/ "except"
Previous Message Marc G. Fournier 2005-11-21 14:47:05 Testing again, ignore ...