From: | 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com> |
---|---|
To: | "shawn wang" <shawn(dot)wang(dot)pg(at)gmail(dot)com>, "ggysxcq(at)gmail(dot)com" <ggysxcq(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "wjzeng" <wjzeng2012(at)gmail(dot)com> |
Subject: | 回复:Re: Is it worth pushing conditions to sublink/subplan? |
Date: | 2021-12-07 09:44:35 |
Message-ID: | 6bebbe3a-c24b-40ab-95db-19750aaef69c.wenjing.zwj@alibaba-inc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers
For my previous proposal, I developed a prototype and passed regression testing.
It works similarly to subquery's qual pushdown. We know that sublink expands
at the beginning of each level of query. At this stage, The query's conditions and
equivalence classes are not processed. But after generate_base_implied_equalities
the conditions are processed, which is why qual can push down to subquery but sublink not.
My POC implementation chose to delay the sublink expansion in the SELECT clause (targetList)
and where clause. Specifically, it is delayed after generate_base_implied_equalities. Thus,
the equivalent conditions already established in the Up level query can be easily obtained
in the sublink expansion process (make_subplan).
For example, if the up level query has a.id = 10 and the sublink query has a.id = b.id, then
we get b.id = 10 and push it down to the sublink quey. If b is a partitioned table and is
partitioned by id, then a large number of unrelated subpartitions are pruned out,
This optimizes a significant amount of Planner and SQL execution time, especially
if the partitioned table has a large number of subpartitions and is what I want.
Currently, There were two SQL failures in the regression test, because the expansion
order of sublink was changed, which did not affect the execution result of SQL.
Look forward to your suggestions on this proposal.
Thanks
Wenjing
------------------原始邮件 ------------------
发件人:shawn wang <shawn(dot)wang(dot)pg(at)gmail(dot)com>
发送时间:Wed Sep 1 10:54:50 2021
收件人:曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
抄送:PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, wjzeng <wjzeng2012(at)gmail(dot)com>
主题:Re: Is it worth pushing conditions to sublink/subplan?
I tested it the way you said and increased the number of sub-tables.
I created a hash partition table of 1000 sub-tables.
Test according to your first SQL, the optimizer cuts the unnecessary sub-tables well.
You can see the plan:
postgres=# explain analyze
postgres-# select temp.p1 from
postgres-# (
postgres(# select p1,p2 from test1.test1hashtable x where x.p1 = '1'
postgres(# union all
postgres(# (values('1','1'))
postgres(# ) temp,
postgres-# test1.test1hashtable y
postgres-# where y.p2 = temp.p2 and y.p1 = '1' and y.p1='1';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..25.55 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1)
Join Filter: (x.p2 = y.p2)
-> Seq Scan on test1hashtable826 y (cost=0.00..12.75 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (p1 = '1'::text)
-> Append (cost=0.00..12.78 rows=2 width=64) (never executed)
-> Seq Scan on test1hashtable826 x (cost=0.00..12.75 rows=1 width=64) (never executed)
Filter: (p1 = '1'::text)
-> Result (cost=0.00..0.01 rows=1 width=64) (never executed)
Planning Time: 0.158 ms
Execution Time: 0.022 ms
(10 rows)
But when the second one runs, the planning time reaches 13.942ms.
The plan:
postgres=# explain analyze
postgres-# select
postgres-# y.p1,
postgres-# (Select x.p2 from test1.test1hashtable x where y.p1 =x.p1 and y.p2=x.p2) as b
postgres-# from test1.test1hashtable y where p1 = '1' and p2 = '1';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on test1hashtable826 y (cost=0.00..13318.30 rows=1 width=64) (actual time=0.004..0.047 rows=0 loops=1)
Filter: ((p1 = '1'::text) AND (p2 = '1'::text))
SubPlan 1
-> Append (cost=0.00..13305.00 rows=1000 width=32) (never executed)
-> Seq Scan on test1hashtable1 x_1 (cost=0.00..13.30 rows=1 width=32) (never executed)
Filter: ((y.p1 = p1) AND (y.p2 = p2))
-> Seq Scan on test1hashtable1000 x_1000 (cost=0.00..13.30 rows=1 width=32) (never executed)
Filter: ((y.p1 = p1) AND (y.p2 = p2))
Planning Time: 13.942 ms
Execution Time: 4.899 ms
(2006 rows)
This is a very worthwhile thing to do. In a relatively large business system, a large number of partition tables and high concurrency are often used. If the planning time is too long, this will greatly affect the business.
regards,
Shawn.
Wenjing <wenjing(dot)zwj(at)alibaba-inc(dot)com> 于2021年8月17日周二 上午10:31写道:
2021年8月16日 17:15,Wenjing <wenjing(dot)zwj(at)alibaba-inc(dot)com> 写道:
Hi Hackers,
Recently, a issue has been bothering me, This is about conditional push-down in SQL.
I use cases from regression testing as an example.
I found that the conditions (B =1) can be pushed down into the subquery, However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?
regards,
Wenjing
example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);
postgres-# explain (costs off)
postgres-# select temp.b from
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all
postgres(# (values(1,1))
postgres(# ) temp,
postgres-# ab y
postgres-# where y.b = temp.b and y.a = 1 and y.b=1;
QUERY PLAN
---------------------------------------------------
Nested Loop
-> Seq Scan on ab_a1_b1 y
Filter: ((b = 1) AND (a = 1))
-> Append
-> Subquery Scan on "*SELECT* 1"
-> Seq Scan on ab_a1_b1 x
Filter: ((a = 1) AND (b = 1))
-> Result
(8 rows)
The conditions (B =1) can be pushed down into the subquery.
postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
QUERY PLAN
---------------------------------------------------
Seq Scan on ab_a1_b1 y
Filter: ((a = 1) AND (b = 1))
SubPlan 1
-> Append
-> Seq Scan on ab_a1_b1 x_1
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a1_b2 x_2
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a1_b3 x_3
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b1 x_4
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b2 x_5
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b3 x_6
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b1 x_7
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b2 x_8
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b3 x_9
Filter: ((y.a = a) AND (y.b = b))
(22 rows)
The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in targetlist.
postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
QUERY PLAN
---------------------------------------------------
Seq Scan on ab_a1_b1 y
Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
SubPlan 1
-> Append
-> Seq Scan on ab_a1_b1 x_1
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a1_b2 x_2
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a1_b3 x_3
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b1 x_4
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b2 x_5
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a2_b3 x_6
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b1 x_7
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b2 x_8
Filter: ((y.a = a) AND (y.b = b))
-> Seq Scan on ab_a3_b3 x_9
Filter: ((y.a = a) AND (y.b = b))
(22 rows)
The conditions (B=1 and A=1) cannot be pushed down to sublink/subplan in where clause.
Attachment | Content-Type | Size |
---|---|---|
0001-poc-pushdown-qual-to-sublink.patch | application/octet-stream | 43.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | kuroda.hayato@fujitsu.com | 2021-12-07 09:48:21 | RE: Allow escape in application_name |
Previous Message | osumi.takamichi@fujitsu.com | 2021-12-07 09:42:35 | RE: Failed transaction statistics to measure the logical replication progress |