Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"
Date: 2024-05-21 11:48:19
Message-ID: 5717be79-8f7d-4039-815a-b6df22d3dc11@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21/05/2024 05:58, David Rowley wrote:
> Let this thread be for at least the coding portion of this or be my
> thread for this patch for the v18 cycle if the RMT rules in favour of
> keeping that code reverted for v17.
>
> I've attached 2 patches.
>
> 0001 is a simple revert of Tom's revert (7204f3591).
> 0002 fixes the issue reported by Hubert.
>
> If anyone wants to have a look, I'd be grateful for that. Tom did
> call for further review after this being the 4th issue reported for
> 66c0185a3.

My planner experience is a bit rusty, but I took a quick look. Looks
generally OK to me. Some comments below:

> + /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */

Duplicated word: "For for"

> /*
> * build_setop_child_paths
> * Build paths for the set op child relation denoted by 'rel'.
> *
> * interesting_pathkeys: if not NIL, also include paths that suit these
> * pathkeys, sorting any unsorted paths as required.
> * *pNumGroups: if not NULL, we estimate the number of distinct groups
> * in the result, and store it there

The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. The
function does a lot of things. How is 'child_tlist' different from
rel->reltarget for example?

'interesting_pathkeys' is modified by the call to
add_setop_child_rel_equivalences(): it adds members to the
EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is
that obvious to someone who know more about the planner?

> /*
> * Create paths to suit final sort order required for setop_pathkeys.
> * Here we'll sort the cheapest input path (if not sorted already) and
> * incremental sort any paths which are partially sorted.
> */
> is_sorted = pathkeys_count_contained_in(setop_pathkeys,
> subpath->pathkeys,
> &presorted_keys);
>
> if (!is_sorted)
> {

Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was a
brief mention of that in the original thread, but I didn't understand
the details. Not for v17, but I'm curious. I was wondering if
build_setop_child_paths() should be named build_union_child_paths(),
since it's only used with UNIONs, but I guess it could be used as is for
INTERSECT too.

# Testing

postgres=# begin; create table foo as select i from generate_series(1,
1000000) i; create index on foo (i); commit;
BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=144370.89..149370.89 rows=1000001 width=4)
-> Sort (cost=144370.89..146870.89 rows=1000001 width=4)
Sort Key: (1)
-> Append (cost=0.00..31038.44 rows=1000001 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace
the "union" with "union all" you do get a MergeAppend.

Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: foo.i, ('foo'::text)
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

postgres=# explain (select 'foo', i from foo union select 'bar', i from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: ('foo'::text), foo.i
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

The following two queries are the same from the user's point of view,
but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1)
union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Unique (cost=326083.66..336083.67 rows=2000001 width=4)
-> Sort (cost=326083.66..331083.67 rows=2000001 width=4)
Sort Key: foo.i
-> Append (cost=0.42..62076.87 rows=2000001 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1) (select i
from foo union select i from x union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=0.89..82926.54 rows=2000001 width=4)
-> Merge Append (cost=0.89..77926.54 rows=2000001 width=4)
Sort Key: foo.i
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Sort (cost=0.02..0.03 rows=1 width=4)
Sort Key: (1)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(8 rows)

I would've expected a MergeAppend in both cases.

None of these test cases are broken as such, you just don't get the
benefit of the optimization. I suspect they might all have the same root
cause, as they all involve constants in the target list. I think that's
a pretty common use case of UNION though.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2024-05-21 12:08:13 Re: CREATE DATABASE with filesystem cloning
Previous Message Andrey M. Borodin 2024-05-21 11:29:54 Re: Injection points: preloading and runtime arguments