Re: plpgsql.consistent_into

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpgsql.consistent_into
Date: 2014-01-14 00:36:49
Message-ID: 4B6AC8BD-FDE6-4944-B793-84A5A85F15E9@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Responding to both of your mails here)

On Jan14, 2014, at 01:20 , Jim Nasby <jim(at)nasby(dot)net> wrote:
> On 1/13/14, 5:57 PM, Josh Berkus wrote:
>> On 01/13/2014 03:41 PM, Florian Pflug wrote:
>>> It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
>>> but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
>>> for a reason. We shouldn't be second-guessing ourselves by changing that later -
>>> not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't.
>>>
>>> (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
>>> IMHO just too late for that)
>>
>> I *really* don't want to go through all my old code to find places where
>> I used SELECT ... INTO just to pop off the first row, and ignored the
>> rest. I doubt anyone else does, either.
>
> Do you regularly have use cases where you actually want just one RANDOM row?
> I suspect the far more likely scenario is that people write code assuming they'll
> get only one row and they'll end up with extremely hard to trace bugs if that
> assumption is ever wrong.

One case that immediatly comes to mind is a JOIN which sometimes returns
multiple rows, and a projection clause that only uses one of the tables
involved in the join.

Another are queries including an ORDER BY - I don't think the patch makes an
exception for those, and even if it did, it probably wouldn't catch all
cases, like e.g. an SRF which returns the rows in a deterministic order.

Or maybe you're picking a row to process next, and don't really care about
the order in which you work through them.

>> The question is, how many bugs stemmed from wrong SQL queries, and what
>> percentage of those would have been caught by this? The way I see it, there
>> are thousands of ways to screw up a query, and having it return multiple
>> rows instead of one is just one of them.
>
> A query that's simply wrong is more likely to fail consistently. Non-strict
> use of INTO is going to fail in very subtle ways (unless you actually DO want
> just the first row, in which case you should explicitly use LIMIT 1).

How so? Say you expect "SELECT * FROM view WHERE c=<n>" to only ever return
one row. Then "SELECT sum(f) FROM table JOIN view ON table.c = view.c" is
just as subtly wrong as the first query is.

> And if we've always had it, why on earth didn't we make STRICT the default
> behavior?

Dunno, but AFAIK pl/pgsql mimics Oracle's PL/SQL, at least in some aspects,
so maybe this is one of the areas where we just do what oracle does.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-01-14 00:43:11 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Tom Lane 2014-01-14 00:34:17 Re: Disallow arrays with non-standard lower bounds