Re: BUG #18152: Join condition is not pushed down to union all subquery

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: lauri(dot)kajan(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18152: Join condition is not pushed down to union all subquery
Date: 2023-10-16 06:46:43
Message-ID: CAMbWs48dZo6=coBNC8EUyGRofxAayVM5gSyiRNMfuj5YbYQCyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Oct 11, 2023 at 12:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I think this is less about "can't push down" than "can't pull up",
> specifically that prepjointree.c fails to flatten that subquery
> into an "appendrel member", per the comments in is_safe_append_member:

Yeah, I agree with you that this is more about "can't pull up". The
first leaf of the UNION ALL appendrel is a subquery that has WHERE quals
so it fails to be pulled up, as explained in the comments you pointed
out.

> I don't recall at the moment if there are fundamental reasons not to
> have per-child quals in appendrels, or if it could be done with the
> application of enough elbow grease. But it's probably not trivial.
> That comment has been there quite awhile.

I'm wondering if we can keep the per-child quals in AppendRelInfos, and
then apply these quals when we create RelOptInfos for the children of an
appendrel, specifically in expand_appendrel_subquery().

I have a go at this and it can fix the origin problem.

EXPLAIN (COSTS OFF)
WITH
targets AS (
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
)
SELECT
dealers.name dealer,
targets.vehicle,
targets.id
FROM
dealers
JOIN targets
ON dealers.name = targets.dealer_name
WHERE
dealers.id IN (54,12,456);
QUERY PLAN
--------------------------------------------------------------
Nested Loop
-> Index Scan using dealers_pkey on dealers
Index Cond: (id = ANY ('{54,12,456}'::integer[]))
-> Append
-> Bitmap Heap Scan on bikes
Recheck Cond: (dealer_name = dealers.name)
Filter: (frame_size = 52)
-> Bitmap Index Scan on bikes_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
-> Bitmap Heap Scan on cars
Recheck Cond: (dealer_name = dealers.name)
-> Bitmap Index Scan on cars_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
(13 rows)

However, when applying the per-child quals in expand_appendrel_subquery,
I cannot find a way to make these quals go through the EC machinery.
And that would cause us to miss some optimal paths, such as

EXPLAIN (COSTS OFF)
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52 and frame_size = id
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars;
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on bikes
Filter: ((frame_size = 52) AND (frame_size = id))
-> Seq Scan on cars
(4 rows)

We should have been able to get 'bikes.id = 52' from '(frame_size = 52)
AND (frame_size = id)', and then use index scan on 'bikes', but ...

Maybe we can achieve that with more efforts, but I'm not sure if this is
worthwhile. Any thoughts?

Thanks
Richard

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-10-16 09:00:00 BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
Previous Message Tom Lane 2023-10-15 15:14:45 Re: System administration functions about relation size ignore changes in the table structure