Re: Join Statements

From: "Andy Marden" <amarden(at)usa(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Join Statements
Date: 2002-02-26 17:20:16
Message-ID: a5ggbc$3032$1@jupiter.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Your problem here is in the use of the use of '!='. All the columns from b
will be null if there is no match. Any null result always fails a test so
null is never not equal to anything (or equal to it for that matter!). Don't
get me started on NULLs any the ludicrous ways that RDBMSs deal them them!

What you need is:

select count(a.*)
from a left outer join b on a.id = b.id
where b.id is null

This will give you the best performing query rather than NOT IN sub query.
It also gives you greater control of the query plan. Internally, these days,
this is what Oracle converts a not in sub-query to if you let it. PostgreSQL
still persists with the old way, it seems, of doing a nested-loop scan of
the sub query for each row in the outer which is usually horribly
inefficient.

The alternative solution below with the NOT ... ANY seems to give the same
query plan as the NOT IN approach

Cheers

Andy

"Mark Nielsen" <python(at)kepnet(dot)net> wrote in message
news:3C6EA55C(dot)8040905(at)kepnet(dot)net(dot)(dot)(dot)
> select distinct TABLENAME_id from TABLENAME_backup where NOT
> TABLENAME_id = ANY (select TABLENAME_id from TABLENAME)
>
> Here is an example of me getting all the ids from the backup table that
> doesn't exist in the main table. This is similar to what you want.
>
> I think I am answering your question, or close to it.
> Mark
>
> T Conti wrote:
>
> > Howdy:
> >
> > I need to put together an SQL statement that will return all of the
> > rows in table A that are not in table B. Table A and B have the same
> > primary key. For example:
> >
> > select count(a.*)
> > from a (nolock) left outer join
> > b (nolock) on a.id = b.id
> > where a.id != b.id
> >
> > This did not work. It returned 0 rows. I know that this could be
> > done very easily in a sub-select, but that seems inefficient. Is
> > there any way to accomplish what I mentioned above in the join
> > statement or is the sub-select the way to go?
> >
> > Thanks for the help,
> > Tom
> >
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Christof Glaser 2002-02-26 17:26:34 Re: Removing duplicates
Previous Message Matthew Hagerty 2002-02-26 17:16:39 Re: Removing duplicates