From: | Ken Hill <ken(at)scottshill(dot)com> |
---|---|
To: | Frank Bax <fbax(at)sympatico(dot)ca> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Non Matching Records in Two Tables |
Date: | 2006-02-08 22:13:42 |
Message-ID: | 1139436823.11150.3.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Wed, 2006-02-08 at 16:27 -0500, Frank Bax wrote:
> At 04:10 PM 2/8/06, Ken Hill wrote:
> >I need some help with a bit of SQL. I have two tables. I want to find
> >records in one table that don't match records in another table based on a
> >common column in the two tables. Both tables have a column named 'key100'.
> >I was trying something like:
> >
> >SELECT count(*)
> >FROM table1, table2
> >WHERE (table1.key100 != table2.key100);
> >
> >But the query is very slow and I finally just cancel it. Any help is very
> >much appreciated.
>
>
> vacuum analyse table1;
>
> vacuum analyse table2;
>
> select count(*) from table1 full outer join table2 on
> table1.key100=table2.key100 where table1.key100 is null or table2.key100 is
> null;
>
> If this is also slow, post output of "EXPLAIN ANALYSE SELECT ...."
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
OK. I added indexes on the two columns in the two tables:
CREATE INDEX key100_idex
ON ncccr9 (key100);
CREATE INDEX key100_ncccr10_idex
ON ncccr10 (key100);
Here is the analysis of the query:
csalgorithm=# EXPLAIN ANALYSE SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 =
ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
QUERY
PLAN
-------------------------------------------------------------------------------- --------------------------------------------------------
Aggregate (cost=208337.59..208337.59 rows=1 width=0) (actual
time=255723.212.. 255723.214 rows=1 loops=1)
-> Hash Left Join (cost=99523.55..207101.41 rows=494471 width=0)
(actual ti me=92326.635..255538.447 rows=38144 loops=1)
Hash Cond: ("outer".key100 = "inner".key100)
Filter: ("inner".key100 IS NULL)
-> Seq Scan on ncccr9 (cost=0.00..59360.71 rows=494471
width=104) (ac tual time=171.778..75099.734 rows=494471 loops=1)
-> Hash (cost=88438.64..88438.64 rows=611564 width=104)
(actual time= 91962.956..91962.956 rows=0 loops=1)
-> Seq Scan on ncccr10 (cost=0.00..88438.64 rows=611564
width=1 04) (actual time=11.704..76519.323 rows=611564 loops=1)
Total runtime: 255724.219 ms
(8 rows)
The result of 38,144 non-matching records seems too much:
csalgorithm=# SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 =
ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
count
-------
38144
(1 row)
Maybe I need to do a RIGHT JOIN to return the count of records in table
'ncccr10' that don't match records in 'ncccr9'?
Thanks for your help. JOINS are fairly new to me.
From | Date | Subject | |
---|---|---|---|
Next Message | pgsql | 2006-02-08 23:12:52 | Syntax for "IF" clause in SELECT |
Previous Message | Owen Jacobson | 2006-02-08 21:31:48 | Re: Non Matching Records in Two Tables |