Re: Contradictory behavior of array_agg(distinct) aggregate.

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Contradictory behavior of array_agg(distinct) aggregate.
Date: 2024-12-04 07:24:40
Message-ID: CAKFQuwbbwLaJW_AAxWQvJbyEzBe9KZp0tE8jaJ_fQbxHgtL7dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, December 3, 2024, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
wrote:
>
> Is it only me who consider that current behavior of array_agg(distinct)
> contradicts to interpretation of nulls in other cases ("null" is something
> like "unknown" which means that we can not say weather two nulls are the
> same or not).

Null value handling has a few varied behaviors related to it. This
particular one is best thought of as desirably being consistent with group
by.

>
> This is why it is allowed to insert multiple nulls in the unique column.

Nowadays the DBA gets to choose which of the two behaviors a unique index
applies, which allows indexes to get on the same page as group by et al.,
thus fixing your inconsistency claim here.

> postgres=# create table t(x integer unique);
> CREATE TABLE
> postgres=# insert into t values (null),(null);
> INSERT 0 2
> postgres=# select count(distinct x) from t;
> count
> -------
> 0
> (1 row)
>
postgres=# select array_agg(distinct x) from t;
> array_agg
> -----------
> {NULL}
> (1 row)
>
> postgres=# select array_agg(x) from t;
> array_agg
> -------------
> {NULL,NULL}
> (1 row)
>
>
> So what is the number of distinct "x" values in this case? I think that
> according to SQL model - 0 (as count(distinct) returns).

1, but getting this answer computed is a non-trivial expression as the
count aggregate can’t do the counting.

> Why in this case array_agg(distinct x) returns non-empty array?

I can be convinced to see an inconsistency here. In count though, not
array_agg. The inability for count to see and thus count null values
cannot be worked around while you can always apply a filter clause to
ignore null values you don’t want.

> Yes, unlike most other aggregates, `array_agg` is not ignoring null values.
> But is it correct to treat two null values as the same (non-distinct)?

Yes, in most contexts where null values are forced to be compared to each
other they do so by defining all null values to be representationally
identical. See group by for the most authoritative reference case.

> IMHO correct result in this case should be either {} or NULL, either
> {NULL,NULL}.
>

You have a typo here somewhere…

If you want the empty array use a filter clause to make it behave
“strictly”. Producing a null output seems indefensible.

A policy that all nulls values are indistinct from one another, which is
the most prevalent one in SQL, makes the most sense to me. My gut says
that “group by col nulls [not] distinct” would be an undesirable thing to
add to the language. It was probably added to unique indexes because they
went the wrong way and needed a way to course-correct.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-12-04 07:46:57 Re: Sort functions with specialized comparators
Previous Message Konstantin Knizhnik 2024-12-04 07:20:24 Re: Contradictory behavior of array_agg(distinct) aggregate.