Re: Clients disconnect but query still runs

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Jasen Betts <jasen(at)xnet(dot)co(dot)nz>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Clients disconnect but query still runs
Date: 2009-07-30 11:29:54
Message-ID: 4A718432.7040405@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Csaba Nagy wrote:
> On Thu, 2009-07-30 at 11:41 +0200, Greg Stark wrote:
>> I know this is a popular feeling. But you're throwing away decades of
>> work in making TCP reliable. You would change feelings quickly if you
>> ever faced this scenario too. All it takes is some bad memory or a bad
>> wire and you would be turning a performance drain into random
>> connection drops.
>
> But if I get bad memory or bad wire I'll get much worse problems
> already, and don't tell me it will work more reliably if you don't kill
> the connection. It's a lot better to find out sooner that you have those
> problems and fix them than having spurious errors which you'll get even
> if you don't kill the connection in case of such problems.

Transient connection issues are not infrequent, and shouldn't promptly
kill connections.

A user's wifi might drop and then re-establish service. They might bump
the Ethernet cable out (and it's inevitably lost its retaining clip). A
router _somewhere_ along the route might reboot. Etc.

That said, TCP keepalives are designed to allow for this, and only
consider the connection dead if it's failed to respond for a reasonable
period and hasn't acknowledged several requests.

> Well it lived for at least one hour (could be more, I don't remember for
> sure) keeping vacuum from doing it's job on a heavily updated DB.

Unless you've changed the defaults, TCP keepalives will take several
hours to notice a dead connection - if they're enabled at all.

> It was
> not so much about my patience as about starting to have abysmal
> performance, AFTER we fixed the initial cause of the crash, and without
> any warning, except of course I did find out immediately that bloat
> happens and found the idle transactions

Idle? I thought your issue was _active_ queries running, servicing
requests from clients that'd since ceased to care?

How did you manage to kill the client in such a way as that the OS on
the client didn't send a FIN to the server anyway? Hard-reset the client
machine(s)?

> and killed them, but I imagine
> the hair-pulling for a less experienced postgres DBA. I would have also
> preferred that postgres solves this issue on it's own - the network
> stack is clearly not fast enough in resolving it.

It's not really meant to happen in the first place. I do think that if
you have a lot of connections from unreliable machines (say hosts with
intermittent connectivity) then you'd want to make sure tcp keepalives
are active and that you've tuned the keepalive params to be much more
aggressive.

I thought your issue was the backend not terminating a query when the
client died while the backend was in the middle of a long-running query.
Keepalives alone won't solve that one.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Csaba Nagy 2009-07-30 11:36:39 Re: Clients disconnect but query still runs
Previous Message Craig Ringer 2009-07-30 11:22:50 Re: Clients disconnect but query still runs