Anti join confusion

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Anti join confusion
Date: 2025-02-24 01:48:18
Message-ID: CAGjGUAKmkDdGq340b84BTGan6qO0+NeGy79SXW59Nd21z1phZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Richard Guo
I found this path https://commitfest.postgresql.org/patch/3235/ already
supports anti join , But I've found that in many cases it doesn't work.It
always uses SubPlan Here's my testing process.

###########
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values (
generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values (
generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values (
generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values (
generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>10000)
test-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86 rows=262562
width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..32181.54 rows=863294 width=4)
-> Index Only Scan using idx_j2 on join2 t2
(cost=0.42..24492.07 rows=863294 width=4)
Index Cond: (id > 10000)
(8 rows)

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61 rows=262562
width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..30676.42 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2
(cost=0.42..22819.92 rows=882100 width=4)
(7 rows)

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID < 1000);
QUERY PLAN
----------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06
rows=1865 width=4)
Index Cond: (id < 1000)
(5 rows)

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID = 1000);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1
width=4)
Index Cond: (id = 1000)
(5 rows)

Thanks

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-02-24 02:08:55 Re: Conflict detection for multiple_unique_conflicts in logical replication
Previous Message Tom Lane 2025-02-24 01:14:16 Re: Statistics Import and Export