From: | Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org> |
Subject: | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date: | 2023-01-15 17:52:47 |
Message-ID: | 4303947c-c313-ec43-d442-80c4c8e92490@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 13/01/23 07:48, David Rowley wrote:
> It would just care if the
> pathkeys were present and return a list of pathkeys not contained so
> that an incremental sort could be done only on the returned list and a
> Unique on an empty returned list.
In create_final_distinct_paths, presorted keys are determined from
input_rel->pathlist & needed_pathkeys. Problem with input_rel->pathlist
is that, for index node, useful_pathkeys is stored in
input_rel->pathlist but this useful_pathkeys
is determined from truncate_useless_pathkeys(index_pathkeys) which
removes index_keys if ordering is different.
Hence, input_rel->pathlist returns null for select distinct b,a from ab
where a < 10; even if index is created on a.
In order to tackle this, I have added index_pathkeys in indexpath node
itself.
Although I started this patch from master, I merged changes to window sort
optimizations.
In patched version:
set enable_hashagg=0;
set enable_seqscan=0;
explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Unique (cost=10000000039.49..10000000063.73 rows=415 width=73)
-> Incremental Sort (cost=10000000039.49..10000000060.62 rows=415 width=73)
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73)
-> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65)
(8 rows)
explain select distinct b,a from ab where a < 10;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=0.71..60.05 rows=611 width=8)
-> Incremental Sort (cost=0.71..55.55 rows=900 width=8)
Sort Key: a, b
Presorted Key: a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8)
Index Cond: (a < 10)
(6 rows)
explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Unique (cost=10000021174.63..10000038095.75 rows=60 width=16)
-> Incremental Sort (cost=10000021174.63..10000036745.75 rows=180000 width=16)
Sort Key: a, b, (count(*) OVER (?))
Presorted Key: a, b
-> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16)
-> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8)
(8 rows)
explain select distinct a, b, count(*) over (partition by a,b,c) from abcd;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Unique (cost=10000021580.47..10000036629.31 rows=60 width=20)
-> Incremental Sort (cost=10000021580.47..10000035279.31 rows=180000 width=20)
Sort Key: a, b, c, (count(*) OVER (?))
Presorted Key: a, b, c
-> WindowAgg (cost=10000021561.37..10000025611.37 rows=180000 width=20)
-> Sort (cost=10000021561.37..10000022011.37 rows=180000 width=12)
Sort Key: a, b, c
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=12)
(8 rows)
explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Unique (cost=2041.88..36764.90 rows=60 width=20)
-> Incremental Sort (cost=2041.88..35414.90 rows=180000 width=20)
Sort Key: b, a, c, (count(*) OVER (?))
Presorted Key: b, a, c
-> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20)
-> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12)
Sort Key: b, a, c
Presorted Key: b
-> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12)
(9 rows)
In master:
explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Unique (cost=10000000059.50..10000000063.65 rows=415 width=73)
-> Sort (cost=10000000059.50..10000000060.54 rows=415 width=73)
Sort Key: relname, relkind, (count(*) OVER (?))
-> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73)
-> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65)
(7 rows)
explain select distinct b,a from ab where a < 10;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=72.20..78.95 rows=611 width=8)
-> Sort (cost=72.20..74.45 rows=900 width=8)
Sort Key: b, a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8)
Index Cond: (a < 10)
(5 rows)
explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Unique (cost=10000023704.77..10000041084.40 rows=60 width=16)
-> Incremental Sort (cost=10000023704.77..10000039734.40 rows=180000 width=16)
Sort Key: a, b, (count(*) OVER (?))
Presorted Key: a
-> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16)
-> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8)
Sort Key: a
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8)
(8 rows)
explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Unique (cost=45151.33..46951.33 rows=60 width=20)
-> Sort (cost=45151.33..45601.33 rows=180000 width=20)
Sort Key: a, b, (count(*) OVER (?))
-> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20)
-> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12)
Sort Key: b, a, c
Presorted Key: b
-> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12)
(8 rows)
Note: Composite keys are also handled.
create index xy_idx on xyz(x,y);
explain select distinct x,z,y from xyz;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=0.86..55.97 rows=60 width=12)
-> Incremental Sort (cost=0.86..51.47 rows=600 width=12)
Sort Key: x, y, z
Presorted Key: x, y
-> Index Scan using xy_idx on xyz (cost=0.15..32.80 rows=600 width=12)
(5 rows)
There are some cases where different kind of scan happens
explain select distinct x,y from xyz where y < 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Unique (cost=47.59..51.64 rows=60 width=8)
-> Sort (cost=47.59..48.94 rows=540 width=8)
Sort Key: x, y
-> Bitmap Heap Scan on xyz (cost=12.34..23.09 rows=540 width=8)
Recheck Cond: (y < 10)
-> Bitmap Index Scan on y_idx (cost=0.00..12.20
rows=540 width=0)
Index Cond: (y < 10)
(7 rows)
As code only checks from IndexPath (at the moment), other scan paths are
not covered.
Is it okay to cover these in same way as I did for IndexPath? (with no
limitation on this behaviour on certain path types?)
Also, I am assuming distinct pathkeys can be changed without any issues.
As changes are limited to modification in distinct path only,
I don't see this affecting other nodes. Test cases are green,
with a couple of failures in window functions (one which I had added)
and one very weird:
EXPLAIN (COSTS OFF)
SELECT DISTINCT
empno,
enroll_date,
depname,
sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Unique
-> Incremental Sort
Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?))
Presorted Key: depname
-> WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Sort
Sort Key: depname, enroll_date
-> Seq Scan on empsalary
(15 rows)
In above query plan, unique used to come after Incremental sort in the
master.
Pending:
1. Consider if Pathkey.pk_strategy and pk_nulls_first need to be
compared too, this is pending
as I have to look these up and understand them.
2. Test cases (failures and new cases)
3. Improve comments
Patch v8 attached.
Please let me know any review comments, will address these in followup patch
with pending items.
Thanks,
Ankit
Attachment | Content-Type | Size |
---|---|---|
better_windowclause_sorting_dgr-v8.patch | text/x-patch | 23.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-15 18:25:11 | Re: UPDATE operation terminates logical replication receiver process due to an assertion |
Previous Message | vignesh C | 2023-01-15 17:32:49 | Re: [Commitfest 2023-01] has started |