Re: BUG #14494: Regression - Null arrays are not queryable

From: Linas Vepstas <linasvepstas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14494: Regression - Null arrays are not queryable
Date: 2017-02-18 00:17:54
Message-ID: CAHrUA34Tsqsg+aHdPj7UyNpwMO-g6tbRrFK5EVMfgUiOxpY87A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-01-13 18:38:24 +0000, linasvepstas(at)gmail(dot)com wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 14494
> > Logged by: Linas Vepstas
> > Email address: linasvepstas(at)gmail(dot)com
> > PostgreSQL version: 9.6.1
> > Operating system: Debian unstable
> > Description:
> >
> > Array handling appears to have regressed,
>
> Regressed from what? Because these don't look like they'd have changed
> recently.
>

I'd have to do forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in
C/C++, or in python or in perl, etc): they have a length, and in C/C++,
they are null-terminated. There is a concept of a null string: a string of
length zero: its a valid string, that you can treat like any other. Its not
exceptional, or different in any way. Standard string ops work on them:
they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but
is still an array, and can be handled using the same kind of code that
non-zero length arrays use. One does not have to write special-case code
for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard.
Special-case code needs to be written to handle them, since they are
exceptional. This doubles the amount of unit-testing required, and doubles
the attack surface for bugs. It is an unintuitive and surprising situation,
for those of us, like me, who have data that contains arrays, some of which
are zero-legnth arrays.

I respect that you might want to design postgres the way way it is
currently, but it is counter-intuitive and surprising, and opens the door
to surprising behaviors and bugs in user-land. It doesn't "smell" like a
rational design choice; instead, it smells like something arbitrary or
accidental.

--linas

>
> > test=> select * from foo where outgoing=null;
> > name | outgoing
> > ------+----------
> > (0 rows)
> >
> > unexpected: wanted one row here.
>
> Comparisons with NULL result in NULL, which evaluates to false in a
> WHERE clause. Try WHERE outgoing IS NULL;
>
>
> > test=> select * from foo where outgoing != '{43,67}';
> > name | outgoing
> > ------+----------
> > two | {}
> > (1 row)
> >
> > unexpected: wanted two rows here. Where did the null row go?
>
> Same answer as above. WHERE outgoing IS DISTINCT FROM ...;
>
> >
> > test=> select * from foo where array_length(outgoing,1)=0 ;
> > name | outgoing
> > ------+----------
> > (0 rows)
> >
> > Huh? there is at least one array whose length is zero,
>
> An empty array doesn't have a dimension 1 and thus results in NULL. Yes,
> that's a bit confusing.
>
> > and another that doesn't have a length...
>
> You mean the NULL one? Why would you expect a 0 as a result then?
>
> Greetings,
>
> Andres Freund
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message bjorn 2017-02-18 00:36:21 BUG #14552: tsquery converts AND operator into OR when nested inside OR operations
Previous Message gp0072000 2017-02-17 17:53:51 BUG #14551: Simple Table Creation via GUI fails due to missing semi-colon