Re: Properly pathify the union planner

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Properly pathify the union planner
Date: 2024-03-11 06:56:15
Message-ID: CAMbWs49H5oKvsUM_SrPbrAPkNqabuxHacjWC+Qd77dmVyMyXEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 8, 2024 at 11:31 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Fri, 8 Mar 2024 at 00:39, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I would like to have another look, but it might take several days.
> > Would that be too late?
>
> Please look. Several days is fine. I'd like to start looking on Monday
> or Tuesday next week.

I've had another look, and here are some comments that came to my mind.

* There are cases where the setop_pathkeys of a subquery does not match
the union_pathkeys generated in generate_union_paths() for sorting by
the whole target list. In such cases, even if we have explicitly sorted
the paths of the subquery to meet the ordering required for its
setop_pathkeys, we cannot find appropriate ordered paths for
MergeAppend. For instance,

explain (costs off)
(select a, b from t t1 where a = b) UNION (select a, b from t t2 where a =
b);
QUERY PLAN
-----------------------------------------------------------
Unique
-> Sort
Sort Key: t1.a, t1.b
-> Append
-> Index Only Scan using t_a_b_idx on t t1
Filter: (a = b)
-> Index Only Scan using t_a_b_idx on t t2
Filter: (a = b)
(8 rows)

(Assume t_a_b_idx is a btree index on t(a, b))

In this query, the setop_pathkeys of the subqueries includes only one
PathKey because 'a' and 'b' are in the same EC inside the subqueries,
while the union_pathkeys of the whole query includes two PathKeys, one
for each target entry. After we convert the setop_pathkeys to outer
representation, we'd notice that it does not match union_pathkeys.
Consequently, we are unable to recognize that the index scan paths are
already appropriately sorted, leading us to miss the opportunity to
utilize MergeAppend.

Not sure if this case is common enough to be worth paying attention to.

* In build_setop_child_paths() we also create properly sorted partial
paths, which seems not necessary because we do not support parallel
merge append, right?

* Another is minor and relates to cosmetic matters. When we unique-ify
the result of a UNION, we take the number of distinct groups as equal to
the total input size. For the Append path and Gather path, we use
'dNumGroups', which is 'rows' of the Append path. For the MergeAppend
we use 'rows' of the MergeAppend path. I believe they are supposed to
be the same, but I think it'd be better to keep them consistent: either
use 'dNumGroups' for all the three kinds of paths, or use 'path->rows'
for each path.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mats Kindahl 2024-03-11 06:59:48 Re: Hooking into ExplainOneQuery() complicated by missing standard_ExplainOneQuery
Previous Message Andrew Dunstan 2024-03-11 06:43:17 Re: WIP Incremental JSON Parser