Re: Non Matching Records in Two Tables

From: Ken Hill <ken(at)scottshill(dot)com>
To: Frank Bax <fbax(at)sympatico(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Non Matching Records in Two Tables
Date: 2006-02-08 22:13:42
Message-ID: 1139436823.11150.3.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, 2006-02-08 at 16:27 -0500, Frank Bax wrote:

> At 04:10 PM 2/8/06, Ken Hill wrote:
> >I need some help with a bit of SQL. I have two tables. I want to find
> >records in one table that don't match records in another table based on a
> >common column in the two tables. Both tables have a column named 'key100'.
> >I was trying something like:
> >
> >SELECT count(*)
> >FROM table1, table2
> >WHERE (table1.key100 != table2.key100);
> >
> >But the query is very slow and I finally just cancel it. Any help is very
> >much appreciated.
>
>
> vacuum analyse table1;
>
> vacuum analyse table2;
>
> select count(*) from table1 full outer join table2 on
> table1.key100=table2.key100 where table1.key100 is null or table2.key100 is
> null;
>
> If this is also slow, post output of "EXPLAIN ANALYSE SELECT ...."
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

OK. I added indexes on the two columns in the two tables:

CREATE INDEX key100_idex
ON ncccr9 (key100);

CREATE INDEX key100_ncccr10_idex
ON ncccr10 (key100);

Here is the analysis of the query:

csalgorithm=# EXPLAIN ANALYSE SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 =
ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
QUERY
PLAN
-------------------------------------------------------------------------------- --------------------------------------------------------
Aggregate (cost=208337.59..208337.59 rows=1 width=0) (actual
time=255723.212.. 255723.214 rows=1 loops=1)
-> Hash Left Join (cost=99523.55..207101.41 rows=494471 width=0)
(actual ti me=92326.635..255538.447 rows=38144 loops=1)
Hash Cond: ("outer".key100 = "inner".key100)
Filter: ("inner".key100 IS NULL)
-> Seq Scan on ncccr9 (cost=0.00..59360.71 rows=494471
width=104) (ac tual time=171.778..75099.734 rows=494471 loops=1)
-> Hash (cost=88438.64..88438.64 rows=611564 width=104)
(actual time= 91962.956..91962.956 rows=0 loops=1)
-> Seq Scan on ncccr10 (cost=0.00..88438.64 rows=611564
width=1 04) (actual time=11.704..76519.323 rows=611564 loops=1)
Total runtime: 255724.219 ms
(8 rows)

The result of 38,144 non-matching records seems too much:

csalgorithm=# SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 =
ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
count
-------
38144
(1 row)

Maybe I need to do a RIGHT JOIN to return the count of records in table
'ncccr10' that don't match records in 'ncccr9'?

Thanks for your help. JOINS are fairly new to me.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message pgsql 2006-02-08 23:12:52 Syntax for "IF" clause in SELECT
Previous Message Owen Jacobson 2006-02-08 21:31:48 Re: Non Matching Records in Two Tables