Re: BUG #14494: Regression - Null arrays are not queryable

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Linas Vepstas <linasvepstas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14494: Regression - Null arrays are not queryable
Date: 2017-02-18 01:08:53
Message-ID: CAKFQuwZ5bta1T7q9rHh=iQvuERfxSegxVZ-OQ4Wf9cfG02YsEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas(at)gmail(dot)com>
wrote:

>
> On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2017-01-13 18:38:24 +0000, linasvepstas(at)gmail(dot)com wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference: 14494
>> > Logged by: Linas Vepstas
>> > Email address: linasvepstas(at)gmail(dot)com
>> > PostgreSQL version: 9.6.1
>> > Operating system: Debian unstable
>> > Description:
>> >
>> > Array handling appears to have regressed,
>>
>> Regressed from what? Because these don't look like they'd have changed
>> recently.
>>
>
> I'd have to do
> ​​
> forensics, but the two other versions I use are 8.4 and 9.1
>
> Let me explain it this way: programmers have a mental model for strings
> (in C/C++, or in python or in perl, etc): they have a length, and in C/C++,
> they are null-terminated. There is a concept of a null string: a string of
> length zero: its a valid string, that you can treat like any other. Its not
> exceptional, or different in any way. Standard string ops work on them:
> they behave like strings of length zero.
>
> The mental model of an array would be similar: an array of length zero,
> but is still an array, and can be handled using the same kind of code that
> non-zero length arrays use. One does not have to write special-case code
> for the empty array (in must programming languages).
>
> In postgres, zero-length arrays appear to be confusing, non-standard.
> Special-case code needs to be written to handle them, since they are
> exceptional. This doubles the amount of unit-testing required, and doubles
> the attack surface for bugs. It is an unintuitive and surprising situation,
> for those of us, like me, who have data that contains arrays, some of which
> are zero-legnth arrays.
>
> I respect that you might want to design postgres the way way it is
> currently, but it is counter-intuitive and surprising, and opens the door
> to surprising behaviors and bugs in user-land. It doesn't "smell" like a
> rational design choice; instead, it smells like something arbitrary or
> accidental.
>
> --linas
>
>
It is a shared belief (in differing strengths) that our array behavior is
less than ideal - but it is functional and the opinions of people already
using our software and have working applications based around the present
behavior are afforded considerable weight when deciding​ whether to fix
something not really broken but also not ideal. Usually the outcome is
that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in
PostgreSQL. Arrays are multi-dimensional. So at least at a distance you
are constructing a flawed mental model of an array based solely upon
experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present
information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing
you might find people willing to point out potentially better ways to do
the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent
handling as possible. That covers your other items. "NULL" is not the
same as <empty> no matter what the underlying data type is (arrays,
strings, etc...). This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data. You
already seem comfortable with the concept of "empty" and in many situations
an empty something is exactly what is meant to be conveyed by the use of
NULL.

If you really want to debate/explore the topic the -general list is the
proper choice. This list should be used for true bug reporting which you
have not yet done. Unintended and undocumented regressions are bugs but so
far that hasn't been shown to be the case. The rest of the exposition is
discussion, not bug reporting or exploring.

David J.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-02-18 01:24:25 Re: BUG #14552: tsquery converts AND operator into OR when nested inside OR operations
Previous Message bjorn 2017-02-18 00:36:21 BUG #14552: tsquery converts AND operator into OR when nested inside OR operations