Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN

From: sulfinu(at)gmail(dot)com
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Date: 2023-05-19 12:46:35
Message-ID: CAGH1kmzre_Nr+mnErVLbAgGpOAmho_VQwY+uMzmVqEzaZKkPEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

I'm sorry if this message brings up once more an already settled issue, but
there's no public list of bug reports for PostgreSQL (the release notes
contain only *acknowledged*, *solved* bugs).

This problem was noticed in version *15.1*, as I don't have the latest
version at hand, but I've read the release notes for *15.2* and *15.3* and
there's no sign of any related change. It's about the outcome of a SELECT
statement which involves computing aggregate functions like json_agg,
jsonb_agg and array_agg over columns that originate from outer-joined
entities, when these entities are *missing*.

Here's a very simple schema to illustrate the problem:
create table a (
id serial2 primary key,
name text
);

create table b (
name text primary key,
a int2 references a(id)
);

insert into a (name) values ('a1'), ('a2');
insert into b (name, a) values
('b for a1', 1), ('another b for a1', 1);
Notice that entity named *a2* in table a has no corresponding bs. Now run
this query:
select a.*, json_agg(b) from
a
left join b on b.a = a.id
group by a.id;
*The aggregation column for entity a2 has value [null] (if array_agg() was
used, the obtained value would be {NULL}).*

Since the query without aggregation
select a.*, b.* from
a
left join b on b.a = a.id;
produces *an entry* for entity *a2* with NULL values for columns belonging
to table b, *I expect the aggregation to produce either
[{"name":null,"a":null}] or the SQL NULL for it, preferrably the latter.*

I suppose there's a dillema regarding what to return: the aggregation is
computed over one row, but its values are actually missing. The curent
choice of PostgreSQL is somewhere in-between, which is inappropriate from
both perspectives - a political compromise. Moreover, in the case of
array_agg(), the returned value could have also originated from a
single-row single-column actual NULL value!

Returning a simple SQL NULL is the most appropriate choice, in line with the
general rule that aggregations over zero rows return NULL
<https://www.postgresql.org/docs/current/functions-aggregate.html>. For the
database client, it has the benefit of being both *cheap* to detect and
*distinguishable* from other cases.

Although there is a workaround for this problem, it implies checking a
stupid condition for absolutely every row in the JOIN result:
select a.*, json_agg(b) filter (where b.name is not null) from
a
left join b on b.a = a.id
group by a.id;

Thoughts?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message sulfinu 2023-05-19 13:21:21 Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Previous Message Les 2023-05-19 10:29:15 initdb faild to initialize full text search dictionaries