Quick Links

Re: NOT IN vs. NOT EXISTS performance

From:	Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
To:	mmoncure(at)gmail(dot)com
Cc:	david(dot)rowley(at)2ndquadrant(dot)com, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: NOT IN vs. NOT EXISTS performance
Date:	2018-11-10 00:06:15
Message-ID:	CABcidkLL09vsGfqc0_TQa8BuUW8KXRcxxr25DS1BQO3n1Sp=zQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Thanks, both!

That's a very interesting thread. I was confident this was a subject that
had been discussed--just wasn't sure where--so thank you for forwarding.

I guess the big-picture summary is that NOT IN's definition introduces
complexity (the nature of which I now understand better) that is usually
unwarranted by the question the querier is asking. So NOT EXISTS will
almost always be preferable when a subquery is involved, unless the
behavior around NULL values is specifically desired.

On Fri, Nov 9, 2018 at 8:45 AM Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Thu, Nov 8, 2018 at 3:12 PM David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> >
> > On 9 November 2018 at 08:35, Lincoln Swaine-Moore
> > <lswainemoore(at)gmail(dot)com> wrote:
> > > My primary question is: why is this approach only possible (for data
> too
> > > large for memory) when using NOT EXISTS, and not when using NOT IN?
> > >
> > > I understand that there is a slight difference in the meaning of the
> two
> > > expressions, in that NOT IN will produce NULL if there are any NULL
> values
> > > in the right hand side (in this case there are none, and the queries
> should
> > > return the same COUNT). But if anything, I would expect that to improve
> > > performance of the NOT IN operation, since a single pass through that
> data
> > > should reveal if there are any NULL values, at which point that
> information
> > > could be used to short-circuit. So I am a bit baffled.
> >
> > The problem is that the planner makes the plan and would have to know
> > beforehand that no NULLs could exist on either side of the join.
>
> Yeah, the core issue is the SQL rules that define NOT IN behaves as:
> postgres=# select 1 not in (select 2);
> ?column?
> ──────────
> t
> (1 row)
>
> postgres=# select 1 not in (select 2 union all select null);
> ?column?
> ──────────
>
> (1 row)
>
> There's a certain logic to it but it's a death sentence for performance.
>
> merlin
>

--
Lincoln Swaine-Moore

In response to

Re: NOT IN vs. NOT EXISTS performance at 2018-11-09 13:45:56 from Merlin Moncure

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jakub Glapa	2018-11-13 13:08:24	Re: dsa_allocate() faliure
Previous Message	Merlin Moncure	2018-11-09 13:45:56	Re: NOT IN vs. NOT EXISTS performance