Quick Links

Re: Allowing NOT IN to use ANTI joins

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Allowing NOT IN to use ANTI joins
Date:	2014-06-10 02:19:26
Message-ID:	14290.1402366766@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Monday, June 9, 2014, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Huh? The point of an antijoin (or indeed most join methods) is that we
>> *don't* have to examine the whole inner input to make a decision.

> But all hash join methods needs to examine the entire *outer* input, no?
> Have I screwed up my terminology here?

I think you're confusing inner and outer inputs --- the sub-select inside
the NOT IN is the inner input according to the way I think about it.
But I had assumed that was what you meant already.

> If you are using NOT IN, then once you find a NULL in the outer input (if
> the outer input is the in-list: clearly you can't reverse the two inputs in
> this case), you don't even need to finish reading the outer input, nor
> start reading the inner input, because all rows are automatically excluded
> by the weird semantics of NOT IN.

The point I'm trying to make is that the goal of most join types is to
give an answer without having necessarily read all of either input.
For instance, if we tried to do this with a mergejoin it wouldn't work
reliably: it might suppose that it could accept an outer row on the basis
of no match in a higher-order sort column before it'd reached any nulls
appearing in lower-order sort columns.

You might be right that we could hot-wire the hash join case in
particular, but I'm failing to see the point of expending lots of extra
effort in order to deliver a useless answer faster. If there are NULLs
in the inner input, then NOT IN is simply the wrong query to make, and
giving an empty output in a relatively short amount of time isn't going
to help clue the user in on that. (If the SQL standard would let us do
so, I'd be arguing for throwing an error if we found a NULL.)

regards, tom lane

In response to

Re: Allowing NOT IN to use ANTI joins at 2014-06-10 01:53:50 from Jeff Janes

Responses

Re: Allowing NOT IN to use ANTI joins at 2014-06-11 09:03:21 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2014-06-10 04:27:38	Re: [bug fix] Memory leak in dblink
Previous Message	Noah Misch	2014-06-10 02:06:04	Re: tests for client programs