Quick Links

Re: PQcancel may hang in the recv call

From:	Peter Juhasz <pjuhasz(at)uhusystems(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PQcancel may hang in the recv call
Date:	2016-05-20 15:10:29
Message-ID:	1463757029.2844.32.camel@uhusystems.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Thu, 2016-05-19 at 15:32 -0400, Tom Lane wrote:
> Peter Juhasz <pjuhasz(at)uhusystems(dot)com> writes:
> >
> > We've found a situation where canceling a query may cause the
> > client to
> > hang, possibly indefinitely. This can happen if the network
> > connection
> > fails in a specific way.
> > ...
> > However, if the network fails in a way that the connection appears
> > to
> > have been established but subsequent packages are dropped silently,
> > this recv() call will block.
> Hmm. I would expect the recv to eventually fail based on TCP
> timeouts,
> but I agree that that would be much longer than you'd typically wish
> to wait.
>

In case the connection goes through, the recv call does return after 60
seconds (on linux, where I'm trying this).

The problem is that in our home-grown framework we'd want to use cancel
to bail out of queries that have already run for too long. So at that
point we've already waited long enough, we don't want to wait even
more.

The situation is even worse in an asynchronous, event-driven
application: in that case we must not block at all. Yet, with the
problem I've described, cancellation blocks just like in the
synchronous case, rendering the entire application unresponsive for
that period.

(It's actually even worse than that, because DBD::Pg's support for
asynchronous operation is half-finished at best: their pg_cancel
function wants to read back the confirmation of the cancellation with
PQgetResult, which blocks indefinitely if the network connection has
failed in the way I've described.)

> >
> > Is this known?
> I do not recall anyone ever reporting something similar --- and that
> code
> has been like that for a long time.

I did forget to mention that I've observed this behavior with
Postgresql 9.5.3 and 9.4.8, but I don't think the actual version
matters much, because as you say, that part of the code has not changed
recently.

I find it strange that nobody has reported similar problems, though -
everyone else has perfect network connections that never drop packets,
never introduce random delays?

>
> >
> > Is this a bug?
> I wouldn't call it that exactly. There might be an opportunity for
> improvement here, but it's not very clear what. Just introducing a
> timeout would likely create more problems than it fixes, considering
> the
> evident rarity of the problem.

In our framework we had to resort to this: but we mark the connection
as unreliable, unusable if even cancellation times out. The point is
that the application must remain responsive, and even in case of a
complete network failure (between the app server and the database) we
must be able to signal this state to the user.

Best regards,
Péter Juhász

PS. and now for something completely different: the menu on http://yum.
postgresql.org/ seems to be broken, the last two items are wrapped
around into a second line.

In response to

Re: PQcancel may hang in the recv call at 2016-05-19 19:32:08 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Michael Paquier	2016-05-20 15:14:49	Re: postgresql-9.5.3 compilation on Solaris SPARC
Previous Message	Tom Lane	2016-05-20 15:04:04	Re: postgresql-9.5.3 compilation on Solaris SPARC