Re: BUG #14494: Regression - Null arrays are not queryable

From: Linas Vepstas <linasvepstas(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14494: Regression - Null arrays are not queryable
Date: 2017-02-18 03:52:57
Message-ID: CAHrUA35ytxJZ1z-Ahiwo8sfybepUgmQOqJi3m2K2jX5DRcBR3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

OK. One of these is a change in behavior from earlier versions; I won't be
able to double-check to provide details for at least a few weeks. (or have
any time for other discussion)

In my case, I am layering a graph DB (a hypergraph DB) on top of SQL. The
array is a list of primary keys. Think of a "multi-tree" or a "forest": a
bunch of trees (DAG's) but these may share nodes. Thus all rows are nodes,
the array is all the children of the node, and leaf-nodes have no children:
either an empty array or a null array or a zero-length array, I'm not picky
on the representation.

--linas

On Fri, Feb 17, 2017 at 7:08 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas(at)gmail(dot)com>
> wrote:
>
>>
>> On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>>
>>> Hi,
>>>
>>> On 2017-01-13 18:38:24 +0000, linasvepstas(at)gmail(dot)com wrote:
>>> > The following bug has been logged on the website:
>>> >
>>> > Bug reference: 14494
>>> > Logged by: Linas Vepstas
>>> > Email address: linasvepstas(at)gmail(dot)com
>>> > PostgreSQL version: 9.6.1
>>> > Operating system: Debian unstable
>>> > Description:
>>> >
>>> > Array handling appears to have regressed,
>>>
>>> Regressed from what? Because these don't look like they'd have changed
>>> recently.
>>>
>>
>> I'd have to do
>> ​​
>> forensics, but the two other versions I use are 8.4 and 9.1
>>
>> Let me explain it this way: programmers have a mental model for strings
>> (in C/C++, or in python or in perl, etc): they have a length, and in C/C++,
>> they are null-terminated. There is a concept of a null string: a string of
>> length zero: its a valid string, that you can treat like any other. Its not
>> exceptional, or different in any way. Standard string ops work on them:
>> they behave like strings of length zero.
>>
>> The mental model of an array would be similar: an array of length zero,
>> but is still an array, and can be handled using the same kind of code that
>> non-zero length arrays use. One does not have to write special-case code
>> for the empty array (in must programming languages).
>>
>> In postgres, zero-length arrays appear to be confusing, non-standard.
>> Special-case code needs to be written to handle them, since they are
>> exceptional. This doubles the amount of unit-testing required, and doubles
>> the attack surface for bugs. It is an unintuitive and surprising situation,
>> for those of us, like me, who have data that contains arrays, some of which
>> are zero-legnth arrays.
>>
>> I respect that you might want to design postgres the way way it is
>> currently, but it is counter-intuitive and surprising, and opens the door
>> to surprising behaviors and bugs in user-land. It doesn't "smell" like a
>> rational design choice; instead, it smells like something arbitrary or
>> accidental.
>>
>> --linas
>>
>>
> It is a shared belief (in differing strengths) that our array behavior is
> less than ideal - but it is functional and the opinions of people already
> using our software and have working applications based around the present
> behavior are afforded considerable weight when deciding​ whether to fix
> something not really broken but also not ideal. Usually the outcome is
> that we leave things unchanged.
>
> Also, you speak of an array of having "a length" - but that isn't true in
> PostgreSQL. Arrays are multi-dimensional. So at least at a distance you
> are constructing a flawed mental model of an array based solely upon
> experience with one-dimensional arrays.
>
> In any case that only addresses:
>
> select * from foo where array_length(outgoing,1)=0 ;
>
> IOW - you make good points but so does PostgreSQL, and given present
> information pragmatism trumps ideology.
>
> That said, if you actually supply some real examples of what you are doing
> you might find people willing to point out potentially better ways to do
> the same thing in PostgreSQL without having to worry about corner cases.
>
> SQL has a concept of null for which we need to retain as consistent
> handling as possible. That covers your other items. "NULL" is not the
> same as <empty> no matter what the underlying data type is (arrays,
> strings, etc...). This covers the other two "unexpecteds".
>
> I would advise you minimize how often NULL appears in your data. You
> already seem comfortable with the concept of "empty" and in many situations
> an empty something is exactly what is meant to be conveyed by the use of
> NULL.
>
> If you really want to debate/explore the topic the -general list is the
> proper choice. This list should be used for true bug reporting which you
> have not yet done. Unintended and undocumented regressions are bugs but so
> far that hasn't been shown to be the case. The rest of the exposition is
> discussion, not bug reporting or exploring.
>
> David J.
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Wei Congrui 2017-02-18 04:41:39 Re: BUG #14549: pl/pgsql parser
Previous Message Casey Witt 2017-02-18 02:37:23 Re: BUG #14546: "point" type does not work with "IS DISTINCT"