From: | Tender Wang <tndrwang(at)gmail(dot)com> |
---|---|
To: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Anti join confusion |
Date: | 2025-02-24 03:24:35 |
Message-ID: | CAHewXN=iKJm7ZaCG0+VfvzL7zU-HZxUVRp4q5Mk-FpxcdmQj+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
wenhui qiu <qiuwenhuifx(at)gmail(dot)com> 于2025年2月24日周一 09:48写道:
> Hi Richard Guo
> I found this path https://commitfest.postgresql.org/patch/3235/
> already supports anti join , But I've found that in many cases it doesn't
> work.It always uses SubPlan Here's my testing process.
>
> ###########
> create table join1 (id integer,name varchar(300),k1 integer);
> create table join2 (id integer,name varchar(300),score integer);
> insert into join1 values (
> generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join1 values (
> generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join1 values (
> generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join1 values (
> generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join1 values (
> generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join1 values (
> generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
> insert into join2 values (
> generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
> insert into join2 values (
> generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
> insert into join2 values (
> generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
> insert into join2 values (
> generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
> create index idx_j1 on join1(id);
> create index idx_j2 on join2(id);
> VACUUM ANALYZE JOIN1;
> VACUUM ANALYZE JOIN2;
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 WHERE T2.ID>10000)
> test-# ;
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------
> Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
> Workers Planned: 2
> -> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86
> rows=262562 width=8)
> Filter: (NOT (ANY (id = (SubPlan 1).col1)))
> SubPlan 1
> -> Materialize (cost=0.42..32181.54 rows=863294 width=4)
> -> Index Only Scan using idx_j2 on join2 t2
> (cost=0.42..24492.07 rows=863294 width=4)
> Index Cond: (id > 10000)
> (8 rows)
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 );
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------
> Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
> Workers Planned: 2
> -> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61
> rows=262562 width=8)
> Filter: (NOT (ANY (id = (SubPlan 1).col1)))
> SubPlan 1
> -> Materialize (cost=0.42..30676.42 rows=882100 width=4)
> -> Index Only Scan using idx_j2 on join2 t2
> (cost=0.42..22819.92 rows=882100 width=4)
> (7 rows)
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 where T2.ID < 1000);
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------
> Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
> Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
> SubPlan 1
> -> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06
> rows=1865 width=4)
> Index Cond: (id < 1000)
> (5 rows)
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 where T2.ID = 1000);
> QUERY PLAN
>
> ------------------------------------------------------------------------------------
> Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
> Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
> SubPlan 1
> -> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1
> width=4)
> Index Cond: (id = 1000)
> (5 rows)
>
>
Planner now doesn't support pulling up the "NOT IN" sublink. The "NOT IN"
sublink will be transformed into SubPlan.
--
Thanks,
Tender Wang
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-02-24 03:26:25 | Re: Anti join confusion |
Previous Message | Andy Alsup | 2025-02-24 03:23:45 | Re: Update docs for UUID data type |