From: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Anti join confusion |
Date: | 2025-02-24 01:48:18 |
Message-ID: | CAGjGUAKmkDdGq340b84BTGan6qO0+NeGy79SXW59Nd21z1phZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Richard Guo
I found this path https://commitfest.postgresql.org/patch/3235/ already
supports anti join , But I've found that in many cases it doesn't work.It
always uses SubPlan Here's my testing process.
###########
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values (
generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values (
generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>10000)
test-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86 rows=262562
width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..32181.54 rows=863294 width=4)
-> Index Only Scan using idx_j2 on join2 t2
(cost=0.42..24492.07 rows=863294 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61 rows=262562
width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..30676.42 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2
(cost=0.42..22819.92 rows=882100 width=4)
(7 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID < 1000);
QUERY PLAN
----------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06
rows=1865 width=4)
Index Cond: (id < 1000)
(5 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID = 1000);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1
width=4)
Index Cond: (id = 1000)
(5 rows)
Thanks
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2025-02-24 02:08:55 | Re: Conflict detection for multiple_unique_conflicts in logical replication |
Previous Message | Tom Lane | 2025-02-24 01:14:16 | Re: Statistics Import and Export |