Re: NOT IN vs. NOT EXISTS performance

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: NOT IN vs. NOT EXISTS performance
Date: 2018-11-08 21:11:40
Message-ID: CAKJS1f9GW13y9Kkdu85Ax5gNpvmzO3xNO5ks5SWL8+TcKc=Qiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 9 November 2018 at 08:35, Lincoln Swaine-Moore
<lswainemoore(at)gmail(dot)com> wrote:
> My primary question is: why is this approach only possible (for data too
> large for memory) when using NOT EXISTS, and not when using NOT IN?
>
> I understand that there is a slight difference in the meaning of the two
> expressions, in that NOT IN will produce NULL if there are any NULL values
> in the right hand side (in this case there are none, and the queries should
> return the same COUNT). But if anything, I would expect that to improve
> performance of the NOT IN operation, since a single pass through that data
> should reveal if there are any NULL values, at which point that information
> could be used to short-circuit. So I am a bit baffled.

The problem is that the planner makes the plan and would have to know
beforehand that no NULLs could exist on either side of the join. For
more simple cases it could make use of NOT NULL constaints, but more
complex cases exist, such as:

SELECT * FROM t1 LEFT JOIN t2 ON t1.x = t2.y WHERE t2.y NOT IN(SELECT
z FROM t3);

There's a bit more reading about the complexity of this in [1]

[1] https://www.postgresql.org/message-id/flat/CAMkU%3D1zPVbez_HWao781L8PzFk%2Bd1J8VaJuhyjUHaRifk6OcUA%40mail.gmail.com#7c6d3178c18103d8508f3ec5982b1b8e

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2018-11-09 13:45:56 Re: NOT IN vs. NOT EXISTS performance
Previous Message Lincoln Swaine-Moore 2018-11-08 19:35:32 NOT IN vs. NOT EXISTS performance