Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN

From: sulfinu(at)gmail(dot)com
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Date: 2023-05-19 13:21:21
Message-ID: CAGH1kmwPpDUCe7-0fN0aRDkUuFgv2HEXh59XQY3k06We193PCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

As it always happens, I've had a better idea for an workaround *after*
sending the e-mail:
select * from
a,
lateral (select json_agg(b) from
b
where b.a = a.id) q;

Hello,
>
> I'm sorry if this message brings up once more an already settled issue,
> but there's no public list of bug reports for PostgreSQL (the release
> notes contain only *acknowledged*, *solved* bugs).
>
> This problem was noticed in version *15.1*, as I don't have the latest
> version at hand, but I've read the release notes for *15.2* and *15.3*
> and there's no sign of any related change. It's about the outcome of a
> SELECT statement which involves computing aggregate functions like
> json_agg, jsonb_agg and array_agg over columns that originate from
> outer-joined entities, when these entities are *missing*.
>
> Here's a very simple schema to illustrate the problem:
> create table a (
> id serial2 primary key,
> name text
> );
>
> create table b (
> name text primary key,
> a int2 references a(id)
> );
>
> insert into a (name) values ('a1'), ('a2');
> insert into b (name, a) values
> ('b for a1', 1), ('another b for a1', 1);
> Notice that entity named *a2* in table a has no corresponding bs. Now run
> this query:
> select a.*, json_agg(b) from
> a
> left join b on b.a = a.id
> group by a.id;
> *The aggregation column for entity a2 has value [null] (if array_agg() was
> used, the obtained value would be {NULL}).*
>
> Since the query without aggregation
> select a.*, b.* from
> a
> left join b on b.a = a.id;
> produces *an entry* for entity *a2* with NULL values for columns
> belonging to table b, *I expect the aggregation to produce either
> [{"name":null,"a":null}] or the SQL NULL for it, preferrably the latter.*
>
> I suppose there's a dillema regarding what to return: the aggregation is
> computed over one row, but its values are actually missing. The curent
> choice of PostgreSQL is somewhere in-between, which is inappropriate from
> both perspectives - a political compromise. Moreover, in the case of
> array_agg(), the returned value could have also originated from a
> single-row single-column actual NULL value!
>
> Returning a simple SQL NULL is the most appropriate choice, in line with the
> general rule that aggregations over zero rows return NULL
> <https://www.postgresql.org/docs/current/functions-aggregate.html>. For
> the database client, it has the benefit of being both *cheap* to detect
> and *distinguishable* from other cases.
>
> Although there is a workaround for this problem, it implies checking a
> stupid condition for absolutely every row in the JOIN result:
> select a.*, json_agg(b) filter (where b.name is not null) from
> a
> left join b on b.a = a.id
> group by a.id;
>
> Thoughts?
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Dippu Kumar 2023-05-19 13:25:56 Re: Need Support to Upgrade from 13.6 to 15.3
Previous Message sulfinu 2023-05-19 12:46:35 Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN