Re: Anti join confusion

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anti join confusion
Date: 2025-02-24 08:00:32
Message-ID: CAHewXNkYHVFpPU-0yYxvNC7mnBr_WCzPRvk-xe0S9qWnPzwvOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

wenhui qiu <qiuwenhuifx(at)gmail(dot)com> 于2025年2月24日周一 15:10写道:

> HI Tom , Tender Wang
> But I saw the path test case are support ,
> [image: image.png]
>

Hi wenhui,

In the above picture, it is NOT EXISTS sub-query, which can be pulled up,
but NOT IN sub-query, which can not now.
As Tom said:
"NOT IN is not convertible to an anti-join: the semantics are wrong
because of the way it treats nulls in the sub-select's output."

and my test case set the id is primary key also SubPlan
> test=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT
> IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual
> time=0.071..209.875 rows=1220199 loops=1)
> Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
> Rows Removed by Filter: 1
> Buffers: shared hit=22190
> SubPlan 1
> -> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44
> rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
> Index Cond: (id = 1000)
> Heap Fetches: 0
> Buffers: shared hit=4
> Planning Time: 0.245 ms
> Execution Time: 247.906 ms
> (11 rows)
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
> Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
> Workers Planned: 2
> -> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30
> rows=254208 width=8)
> Filter: (NOT (ANY (id = (SubPlan 1).col1)))
> SubPlan 1
> -> Materialize (cost=0.42..31028.21 rows=829768 width=4)
> -> Index Only Scan using join2_pkey on join2 t2
> (cost=0.42..23637.37 rows=829768 width=4)
> Index Cond: (id > 10000)
> (8 rows)
>
> test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
> (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------
> Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
> Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
> SubPlan 1
> -> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22
> rows=10331 width=4)
> Index Cond: (id < 10000)
> (5 rows)
>
> test=#
>
>
--
Thanks,
Tender Wang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sadeq Dousti 2025-02-24 08:07:09 Re: psql \dh: List High-Level (Root) Tables and Indexes
Previous Message wenhui qiu 2025-02-24 07:09:30 Re: Anti join confusion