Re: Client-based EOFs triggering hung queries?

From: Christopher Opena <counterveil(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Client-based EOFs triggering hung queries?
Date: 2011-05-16 21:01:42
Message-ID: BANLkTim0ONH5ZNfnYV70CZxGvm0g0PEDNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the reply, Tom. I admit we were a bit rushed during the
troubleshooting process; now that we know precisely how to identify these
procs and deal with them, I imagine we'll grab more info next time before
killing them, including a netstat view and a stack trace per your
recommendation.

Cheers,
-Chris.

On Mon, May 16, 2011 at 1:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Christopher Opena <counterveil(at)gmail(dot)com> writes:
> > Hello all,
> > First time poster here - probably a good sign since I've been running
> > postgresql with zero issues for the last several years! At any rate,
> I've
> > recently run into a strange issue. Client-based EOFs are nothing new to
> our
> > application; people can sometimes close a connection for a number of
> > reasons. However, on the DB side these have always been released with no
> > issue.
>
> > Today, we had 4 connections that saw either a client-based EOF or a
> > 'canceling statement due to user request', and in all of these 4 cases
> the
> > query remained active and started chewing heavily into the cpu such that
> > they ground the db server to a halt until each of the procpids were
> manually
> > issued a 'pg_cancel_backed ( procpid );' from the console. None of the 4
> > queries were particularly heavy; they registered approximately a 2300ms
> > completion time using a query EXPLAIN ANALYZE. A little long, but not
> too
> > far out of the ordinary on our usual database usage.
>
> It's always been the case that if the client drops the connection
> mid-query, the backend will notice that only when it next tries to send
> some data --- and even then, if the kernel isn't aware that the far end
> has dropped the connection, the kernel may accept and buffer quite a bit
> of data before blocking the backend from doing more work. So if the
> query requires a lot of processing before it starts to emit data, the
> backend can do a lot of work before noticing anything is wrong.
> I suspect that these queries were expensive and you made a mistake in
> what you tested after the fact (or maybe conditions changed, eg
> autovacuum came along and updated statistics). It's also possible that
> some network glitch affected the connections such that the server's
> kernel didn't know they were lost. It's hard to say much more than that
> without a lot more evidence than you've provided. If it happens again,
> you might try poking a bit harder into the situation before you kill the
> queries --- netstat's opinion of the connection statuses might be
> interesting, and so might stack traces from the busy backends.
>
> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Gauthier, Dave 2011-05-16 21:47:04 Suppress "INSERT x x" messages
Previous Message Bosco Rama 2011-05-16 20:58:38 Re: Remove Modifiers on Table