Re: Non Matching Records in Two Tables

From: Ken Hill <ken(at)scottshill(dot)com>
To: Markus Schaber <schabi(at)logix-tt(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Non Matching Records in Two Tables
Date: 2006-02-14 20:55:07
Message-ID: 1139950507.3083.64.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, 2006-02-08 at 22:31 +0100, Markus Schaber wrote:

> Hi, Ken,
>
> Ken Hill schrieb:
> > I need some help with a bit of SQL. I have two tables. I want to find
> > records in one table that don't match records in another table based on
> > a common column in the two tables. Both tables have a column named
> > 'key100'. I was trying something like:
> >
> > SELECT count(*)
> > FROM table1, table2
> > WHERE (table1.key100 != table2.key100);
> >
> > But the query is very slow and I finally just cancel it. Any help is
> > very much appreciated.
>
> Do you have indices on the key100 columns? Is autovacuum running, or do
> you do analyze manually?
>
> Can you send us the output from "EXPLAIN ANALYZE [your query]"?
>
> Btw, I don't think this query will do what you wanted, it basically
> creates a cross product, that means if your tables look like:
>
> schabitest=# select * from table1;
> key100 | valuea | valueb
> --------+--------+--------
> 1 | foo | bar
> 2 | blah | blubb
> 3 | manga | mungo
>
> schabitest=# select * from table2;
> key100 | valuec | valued
> --------+--------+--------
> 1 | monday | euro
> 2 | sunday | dollar
> 4 | friday | pounds
>
> Then your query will produce something like:
> schabitest=# select * from table1, table2 WHERE (table1.key100 !=
> table2.key100);
> key100 | valuea | valueb | key100 | valuec | valued
> --------+--------+--------+--------+--------+--------
> 1 | foo | bar | 2 | sunday | dollar
> 1 | foo | bar | 4 | friday | pounds
> 2 | blah | blubb | 1 | monday | euro
> 2 | blah | blubb | 4 | friday | pounds
> 3 | manga | mungo | 1 | monday | euro
> 3 | manga | mungo | 2 | sunday | dollar
> 3 | manga | mungo | 4 | friday | pounds
>
> I suggest you would like to have all records from table1 that don't have
> a corresponding record in table2:
>
> schabitest=# select * from table1 where table1.key100 not in (select
> key100 from table2);
> key100 | valuea | valueb
> --------+--------+--------
> 3 | manga | mungo
>
> HTH,
> Markus
>

Here is my query SQL:

SELECT key100 FROM ncccr10
WHERE ncccr10.key100 NOT IN (SELECT key100 FROM ncccr9);

It is is running after 30 minutes. Here is the query plan:

QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on ncccr10 (cost=0.00..20417160510.08 rows=305782 width=104)
Filter: (NOT (subplan))
SubPlan
-> Seq Scan on ncccr9 (cost=0.00..65533.71 rows=494471 width=104)
(4 rows)

Any ideas why it is so slow?

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2006-02-14 21:07:29 Re: Non Matching Records in Two Tables
Previous Message Markus Schaber 2006-02-14 16:39:03 Re: Slow update SQL