Re: Perfomance of IN-clause with many elements and possible solutions

From: Dmitry Lazurkin <dilaz03(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Perfomance of IN-clause with many elements and possible solutions
Date: 2017-07-24 22:22:30
Message-ID: 7fa30f83-2f70-e29d-bd77-d59ac0e0002a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 25.07.2017 01:15, David G. Johnston wrote:
> On Mon, Jul 24, 2017 at 3:12 PM, Dmitry Lazurkin <dilaz03(at)gmail(dot)com
> <mailto:dilaz03(at)gmail(dot)com>>wrote:
>
> And I have one question. I don't understand why IN-VALUES doesn't
> use Semi-Join? PostgreSQL has Hash Semi-Join... For which task
> the database has node of this type?
>
>
> ​Semi-Join is canonically written as:
>
> SELECT *
> FROM tbl
> WHERE EXISTS (SELECT 1 FROM tbl2 WHERE tbl.id <http://tbl.id> =
> tbl2.id <http://tbl2.id>)
>
> The main difference between IN and EXISTS is NULL semantics.
>
> David J.
>

ALTER TABLE ids ALTER COLUMN id SET NOT NULL;
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN
:values_clause;

Aggregate (cost=245006.46..245006.47 rows=1 width=8) (actual
time=3824.095..3824.095 rows=1 loops=1)
Buffers: shared hit=44248
-> Hash Join (cost=7.50..235006.42 rows=4000019 width=0) (actual
time=1.108..3327.112 rows=3998646 loops=1)
...

Hmmm. No Semi-Join.

PostgreSQL can use Semi-Join for IN too.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-07-24 22:25:09 Re: Perfomance of IN-clause with many elements and possible solutions
Previous Message David G. Johnston 2017-07-24 22:15:03 Re: Perfomance of IN-clause with many elements and possible solutions