Re: [patch] libpq one-row-at-a-time API

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] libpq one-row-at-a-time API
Date: 2012-08-02 21:01:07
Message-ID: 2685.1343941267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In principle I suppose we could hack PQconsumeInput enough that it
>> didn't damage the row buffer while still meeting its API contract of
>> clearing the read-ready condition on the socket; but it wouldn't be
>> simple or cheap to do so.

> Any code using single-row-mode is new. Any code calling consumeInput
> before fully draining rows from buffer is buggy, as it allows unlimited growth
> of network buffer, which breaks whole reason to use single-row mode.

Sorry, that argument will not fly. The API specification for
PQconsumeInput is that it can be called at any time to drain the kernel
input buffer, without changing the logical state of the PGconn. It's
not acceptable to insert a parenthetical "oh, but you'd better not try
it in single-row mode" caveat.

The reason I find this of importance is that PQconsumeInput is meant to
be used in an application's main event loop for the sole purpose of
clearing the read-ready condition on the connection's socket, so that
it can wait for other conditions. This could very well be decoupled
from the logic that is involved in testing PQisBusy and
fetching/processing actual results. (If we had not intended to allow
those activities to be decoupled, we'd not have separated PQconsumeInput
and PQisBusy in the first place.) Introducing a dependency between
these things could lead to non-reproducible, timing-dependent, very
hard-to-find bugs.

By the same token, if somebody were trying to put single-row-mode
support into an existing async application, they'd be fooling with the
parts that issue new queries and collect results, but probably not with
the main event loop. Thus, I do not believe your argument above that
"any code using single-row mode is new". The whole app is not going to
be new. It could be very hard to guarantee that there is no path of
control that invokes PQconsumeInput before the new data is actually
processed, because there was never before a reason to avoid extra
PQconsumeInput calls.

As I said, this is the exact same objection I had to the original scheme
of exposing the row buffer directly. Putting a libpq function in front
of the row buffer doesn't solve the problem if that function is
implemented in a way that's still vulnerable to misuse or race conditions.

> And now libpq allows such apps to go from 2x row size to full resultset
> size with unfortunate coding. Why?

This argument is completely nonsensical. The contract for
PQconsumeInput has always been that it consumes whatever the kernel has
available. If you don't extract data from the library before calling it
again, the library's internal buffer may grow to more than the minimum
workable size, but that's a tradeoff you may be willing to make to
simplify your application logic. I do not think that it's an
improvement to change the API spec to say either that you can't call
PQconsumeInput in certain states, or that you can but it won't absorb
data from the kernel.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-08-02 21:44:11 Re: [patch] libpq one-row-at-a-time API
Previous Message Marko Kreen 2012-08-02 20:24:42 Re: [patch] libpq one-row-at-a-time API