From: | Peter Childs <blue(dot)dragon(at)blueyonder(dot)co(dot)uk> |
---|---|
To: | |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: EXCEPT Queries |
Date: | 2003-02-20 08:29:17 |
Message-ID: | Pine.LNX.4.44.0302200822130.5033-100000@RedDragon.Childs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Wed, 19 Feb 2003, Mark Mitchell wrote:
> I have a query that I must dedup using data in another table.
> This can be done quiet easily using an EXCEPT clause as long as both
> queries return the same result set, the problem with that is I only want
> to dedup based on a single columns value not dedup based on entire rows.
> The only other way I can see of doing this is using a NOT IN () clause,
> this seems to take much more time and resources than an EXCEPT
> statement. Is there any way to quickly dedup two tables based on only
> one row?
>
> Here is an example of what I'm currently doing.
>
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB | 000001
> JOE | 000002
>
> TABLE "B"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB | 000001
>
> To dedup table "A" using the data in table "B" I could use the
> following, except that the dedup takes place on the whole row when I
> only want it to take place on the "ACCOUNT_NUMBER" column.
>
> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> EXCEPT
> SELECT
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
> FROM "B"
>
>
> I could use this but it takes WAY to long when both result sets contain
> more than a few hundred records. It seems to increase exponentially the
> more records you add to each result set. The data sets I'm working with
> contain anywhere from 0 to 500000 records.
>
> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
> SELECT
> "B"."ACCOUNT_NUMBER"
> FROM "B"
> )
>
Big sub queries are a bad idea. Since we know where the number is
going to be why does this not work?
SELECT a.subscriber_name, a.account_number FROM a WHERE
EXISTS (SELECT b.account_number FROM b WHERE
b.account_number=a.account_number);
it should be a lot quicker if your account_number columns are
indexed.
I hope that helps
Peter Childs
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen.Thompson | 2003-02-20 10:09:42 | Re: VIEW or Stored Proc - Is this even possible? |
Previous Message | Josh Berkus | 2003-02-20 06:45:15 | Re: EXCEPT Queries |