Re: Fetch zero result rows when executing a query?

From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fetch zero result rows when executing a query?
Date: 2015-02-07 18:33:33
Message-ID: CAKFQuwZec+d0cz9qZAuujYTYhgLwmBDJJUY+oyE5MR3qgrsN8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 7, 2015 at 10:41 AM, Shay Rojansky [via PostgreSQL] <
ml-node+s1045698n5837082h66(at)n5(dot)nabble(dot)com> wrote:

> Sorry everyone, I was unexpectedly very busy and couldn't respond
> earlier... My apologies.
>
> I'll clarify just a little... I am indeed talking about the PostgreSQL
> network protocol, and not about query optimization (with LIMIT or omitting
> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
> ​ ​
> through which the user indicates they're not interested in any result rows
> whether those exist or not.
>

​​ExecuteNonQuery returns an integer while row-returning queries ​do not.
I'd argue that the API states that the user is declaring that the query
they are executing does not return any actual rows - just a count of
affected rows - not that they do not care to see what rows are returned.

> For the situation where a user does ExecuteNonQuery but the query returns
> result rows, the driver can save the needless network transfers. We can
> definitely say it's the user's fault for providing a query with a resultset
> to ExecuteNonQuery, but we *do* have the user's clear intention that no
> rows be fetched so why not act on it. I agree this isn't a terribly
> important optimization, the trigger for this question was first and
> foremost curiosity: it seems strange the protocol allows you to specify
> max_rows for any value other than 0.
>

​Yes, it does seem strange and, like Marko said, ideally would be
deprecated. The fact that it cannot handle "zero rows" seems like an
unnecessary limitation and I cannot image that any values other than 0 and
all would be of practical usage. In the case of zero returning instead the
number of rows would be more useful than simply refusing to return anything
so even if something like this is needed the current implementation is
flawed.

> Here's a possible believable use-case which doesn't involve user neglect:
> imagine some server-side function which has side-effects and also returns
> some rows. In some situations the user is interested in the result rows,
> but in others they only want the side-effect. The user would probably have
> no control over the function, and their only way to *not* transfer the
> result rows would be with a mechanism such as max_rows.
>

​Functions always return rows and so should not be executed using
"ExecuteNonQuery". In most cases action-oriented functions return a single
result-status row so ignoring that row, while likely not advisable, is not
exactly expensive.​

> Marko, regarding your general criticism of max_rows:
>
> > This seems to be a common pattern, and I think it's a *huge* mistake to
> specify maxrows=1 and/or ignore rows after the first one in
> > the driver layer. If the user says "give me the only row returned by
> this query", the interface should check that only one row is in
> > reality returned by the query. If the query returns more than one row,
> the user made a mistake in formulating the query and she
> > probably wants to know about it. If she genuinely doesn't care about
> the rows after the first one, she can always specify LIMIT 1.
>
> > For a sad example, look at PL/PgSQL's SELECT .. INTO ..; it's not
> terribly difficult to write a query which returns more than one
> > row *by mistake* and have something really bad happen later on since it
> went undetected during testing because you just
> > happened to get the expected row back first. And when you do want to
> specifically enforce it for e.g. security critical code,
> > you have to resort to really ugly hacks like window functions.
>
> There are some problems with what you say... First, the ADO.NET API
> provides a SingleRow API option which explicitly provides exactly this.
> This API option doesn't at all mean that there *should* be only one row
> (i.e. an error should be raised if otherwise), but simply that any other
> rows beyond the first should be discarded. So regardless of what we think
> best practices are on this, this behavior is mandated/specified by a major
> API.
>

​"​The resultset may contain multiple rows, which are ignored by
ExecuteScalar"; so ignore them.

The basic question here becomes - the executor already must generate, in
memory, all of the rows so is there a way to properly interact with the
server where you can request the number of rows that were generated but not
be obligated to actually pull them down to the client. This doesn't seem
like an unreasonable request but assuming that it is not currently possible
(of which I have little clue) then the question becomes who cares enough to
design and implement such a protocol enhancement.

>
> More to the point, doesn't max_rows=1 have exactly the same dangers as
> LIMIT 1? The two seem to be identical, except that one is expressed in the
> SQL query and the other at the network protocol level.
>

​The planner does not have access to network protocol level​ options while
it does know about LIMIT.

> The way I see it, if the user specifies one of them without specifying
> ORDER BY, they are explicitly saying they don't care which row comes out.
> And if their testing code fails because this is wrong, then they've made a
> mistake - IMHo this isn't a reason to kill the entire feature.
>
> In general, in my view it's beneficial to separate between the SQL queries
> and the features that the driver is supposed to provide in its API. The SQL
> may be written or managed by one entity, reused in many places (some of
> which want all rows and others which want only 1).
>
>
​Then the driver writers that need these special API behaviors are
reasonably expected to contribute to adding them to backend products that
do not already have them. The database developers are not going to take on
responsibility for the API decisions of others; and features deemed (or
that in reality are) of marginal usefulness are likely to be omitted -
intentionally or otherwise - from the official (in this case libpq)
protocol.

Expecting users to use an API without knowledge or control of the SQL that
is being executed seems like a stretch to me. Expecting the driver to
simply provide an easy way to access data from the common SQL idioms a user
might use seems like a reasonable goal and puts leaves the smarts in the
purvue of the planner. The optimization you require doesn't seem
unreasonable but also doesn't seem especially compelling - nor matter how
many people might be using ADO.NET (which provides no indication that they
are trying to use APIs that are incompatible with the queries that they are
sending.

David J.

--
View this message in context: http://postgresql.nabble.com/Fetch-zero-result-rows-when-executing-a-query-tp5836537p5837084.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2015-02-07 19:01:48 Re: How do I bump a row to the front of sort efficiently
Previous Message Shay Rojansky 2015-02-07 17:40:34 Re: Fetch zero result rows when executing a query?