From: | Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> |
---|---|
To: | pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | [PATCH] Teach planner to further optimize sort in distinct |
Date: | 2023-01-17 19:27:54 |
Message-ID: | da9425ae-8ff7-33d9-23b3-2a3eb605e106@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, this is extension of `teach planner to evaluate multiple windows in
the optimal order` work applied to distinct operation.
Based on discussions before
(https://www.postgresql.org/message-id/flat/CAApHDvr7rSCVXzGfVa1L9pLpkKj6-s8NynK8o%2B98X9sKjejnQQ%40mail.gmail.com#e01327a3053d9281c40f281ef7105b42)
,
> All I imagine you need to do for it
> is to invent a function in pathkeys.c which is along the lines of what
> pathkeys_count_contained_in() does, but returns a List of pathkeys
> which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
> that does not exist as a pathkey in keys1. In
> create_final_distinct_paths(), you can then perform an incremental
> sort on any input_path which has a non-empty return list and in
> create_incremental_sort_path(), you'll pass presorted_keys as the
> number of pathkeys in the path, and the required pathkeys the
> input_path->pathkeys + the pathkeys returned from the new function.
There is bit confusion in wording here:
"returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1."
You mean extract common keys without ordering right?
Example: keys1 = (a,b,c), keys2 = (b,a)
returns (a,b)
and
keys1 = (a,b,c), keys = (d)
returns = ()
which translates to
needed_pathkeys = (a,b,c) = key2
input_pathkeys = (b,a) key1
returns (b,a) = common_keys
new needed_pathkeys = unique(common_keys + old needed_pathkeys)
=> new needed_pathkeys = (b,a,c)
The new needed_pathkeys matches input_pathkeys.
This is what I implemented in the patch.
The patched version yields the following plans:
set enable_hashagg=0;
set enable_seqscan=0;
explain (costs off) select distinct relname,relkind,count(*) over
(partition by
relkind) from pg_Class;
QUERY PLAN
---------------------------------------------------------
Unique
-> Incremental Sort
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg
-> Sort
Sort Key: relkind
-> Seq Scan on pg_class
(8 rows)
explain (costs off) select distinct a, b, count(*) over (partition by b,
a) from abcd;
QUERY PLAN
--------------------------------------------------------
Unique
-> Incremental Sort
Sort Key: b, a, (count(*) OVER (?))
Presorted Key: b, a
-> WindowAgg
-> Incremental Sort
Sort Key: b, a
Presorted Key: b
-> Index Scan using b_idx on abcd
(9 rows)
explain (costs off) select distinct a, b, count(*) over (partition by c,
d) from abcd;
QUERY PLAN
--------------------------------------------------------
Unique
-> Sort
Sort Key: a, b, (count(*) OVER (?))
-> WindowAgg
-> Incremental Sort
Sort Key: c, d
Presorted Key: c
-> Index Scan using c_idx on abcd
(8 rows)
Issue with index path still remains as pathkeys get purged by
truncate_useless_pathkeys
and hence are not available in create_final_distinct_paths for the above
optimizations.
I have attached a patch for the reference.
Thanks,
Ankit
Attachment | Content-Type | Size |
---|---|---|
distinct_sort_optimizations.patch | text/x-patch | 2.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Isaac Morland | 2023-01-17 19:29:04 | Re: Remove source code display from \df+? |
Previous Message | Peter Eisentraut | 2023-01-17 19:23:49 | Re: constify arguments of copy_file() and copydir() |