Re: BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mike22e(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on.
Date: 2020-11-27 19:07:42
Message-ID: 720627.1606504062@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> The query plan produced by EXPLAIN ANALYZE does not show all filters being
> applied when querying with distinct on over a union'd set of queries.

Hm? Your example has only one filter condition, and there's only one
in the plan.

=# explain verbose select * from generate_series(0,20) n1
left join (
select distinct on (id) * from (
select n2 as id, n2+2 as a from generate_series(0,10) n2
union all
select n3+5 as id, n3+12 as a from generate_series(0,10) n3
) s1 order by id, a
) s2 on s2.id = n1 where 20 is distinct from s2.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.52..1.79 rows=20 width=12)
Output: n1.n1, n2.n2, ((n2.n2 + 2))
Inner Unique: true
Hash Cond: (n1.n1 = n2.n2)
Filter: (20 IS DISTINCT FROM ((n2.n2 + 2)))
-> Function Scan on pg_catalog.generate_series n1 (cost=0.00..0.21 rows=21 width=4)
Output: n1.n1
Function Call: generate_series(0, 20)
-> Hash (cost=1.24..1.24 rows=22 width=8)
Output: n2.n2, ((n2.n2 + 2))
-> Unique (cost=0.91..1.02 rows=22 width=8)
Output: n2.n2, ((n2.n2 + 2))
-> Sort (cost=0.91..0.96 rows=22 width=8)
Output: n2.n2, ((n2.n2 + 2))
Sort Key: n2.n2, ((n2.n2 + 2))
-> Append (cost=0.00..0.42 rows=22 width=8)
-> Function Scan on pg_catalog.generate_series n2 (cost=0.00..0.14 rows=11 width=8)
Output: n2.n2, (n2.n2 + 2)
Function Call: generate_series(0, 10)
-> Function Scan on pg_catalog.generate_series n3 (cost=0.00..0.17 rows=11 width=8)
Output: (n3.n3 + 5), (n3.n3 + 12)
Function Call: generate_series(0, 10)
(22 rows)

I think the actual issue here is that EXPLAIN has no good
way to reconstruct the subquery alias "s2.a", so what it prints
is an expansion based on the first append child. Even if we
could reconstruct "s2.a", printing the expansion is more useful
and less confusing in most cases (admittedly not so much in
this one).

The extra parentheses that you see around "(n2.n2 + 2)" in the upper query
nodes are indicators that these are just references to the output
of the bottom plan node, ie "n2.n2 + 2" is only being computed
once at the n2 function scan node, and then bubbled up into the
upper levels. While we could print those references as just Vars,
in a lot of cases there'd be no very good name to use for them.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-11-27 22:13:28 Re: SV: Problem with pg_notify / listen
Previous Message David G. Johnston 2020-11-27 18:30:53 Re: Update on