Re: Anti join confusion

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anti join confusion
Date: 2025-02-24 07:09:30
Message-ID: CAGjGUAKJw6GQA-x=H3fKKyMUkqEcxy5EnwovqyOp7kh8fgy8Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI Tom , Tender Wang
But I saw the path test case are support ,
[image: image.png]
and my test case set the id is primary key also SubPlan
test=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual
time=0.071..209.875 rows=1220199 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 1
Buffers: shared hit=22190
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44
rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 1000)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.245 ms
Execution Time: 247.906 ms
(11 rows)

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30 rows=254208
width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..31028.21 rows=829768 width=4)
-> Index Only Scan using join2_pkey on join2 t2
(cost=0.42..23637.37 rows=829768 width=4)
Index Cond: (id > 10000)
(8 rows)

test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22
rows=10331 width=4)
Index Cond: (id < 10000)
(5 rows)

test=#

On Mon, Feb 24, 2025 at 11:26 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> wenhui qiu <qiuwenhuifx(at)gmail(dot)com> writes:
> > I found this path https://commitfest.postgresql.org/patch/3235/
> already
> > supports anti join , But I've found that in many cases it doesn't work.It
> > always uses SubPlan Here's my testing process.
>
> NOT IN is not convertible to an anti-join: the semantics are wrong
> because of the way it treats nulls in the sub-select's output.
>
> In principle you could do it if you could prove that the sub-select's
> output is never null, but we need to do this transformation long
> before we have enough information to make such a deduction.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tender Wang 2025-02-24 08:00:32 Re: Anti join confusion
Previous Message Laurenz Albe 2025-02-24 07:02:17 Re: Update docs for UUID data type