Re: JOIN vs. WHERE ... IN (subselect)

From: "Dean Gibson (DB Administrator)" <dba-sql(at)ultimeth(dot)net>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: JOIN vs. WHERE ... IN (subselect)
Date: 2003-05-17 15:12:24
Message-ID: 5.1.0.14.2.20030517080543.02deabc8@imaps.ultimeth.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

The word 'subselect' is not a keyword but just an alias; see the syntax under SELECT in the docs under 'SQL Commands' (7.3.2):

| ( select ) [ AS ] alias [ ( column_alias_list ) ]

Chris Linstruth wrote on 2003-05-17 07:45:
>I was just given the following. Not your exact query but replacing
>WHERE IN with JOIN ... AS SUBSELECT fixed my performance problems.
>This query wnet from 15-30 seconds to under a second.
>
>I have attempted to research exactly what Mr. Arnold did, and found
>nothing regarding JOIN ... AS SUBSELECT in the docs. Any pointers
>appreciated.
>
>On Sun, 11 May 2003, Mathieu Arnold wrote:
>>
>>
>> --Le 11/05/2003 09:48 -0700, Chris Linstruth \xe9crivait :
>>
>> |
>> | But this can take 15-30 seconds:
>> |
>> | SELECT count(radacctid) AS sessions,
>> | sum(acctsessiontime) AS connecttime
>> | FROM radacct
>> | WHERE radacctid IN
>> | (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
>> | WHERE username='cjl'
>> | AND acctstoptime IS NOT NULL
>> | AND date_trunc('month', now())=date_trunc('month',
>> | acctstoptime));
>> |
>> | There are probably many different ways to perform this query. My
>> | main problem is trying to overcome the fact that try as I might,
>> | I can't stop the occasional duplicate accounting record from being
>> | inserted so I have to weed them out, hence the "DISTINCT ON
>> | (acctsessionid)".
>>
>> try this :
>> SELECT count(radacctid) AS sessions,
>> sum(acctsessiontime) AS connecttime
>> FROM radacct
>> JOIN (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
>> WHERE username='cjl'
>> AND acctstoptime IS NOT NULL
>> AND date_trunc('month', now())=date_trunc('month',
>> acctstoptime)) AS subselect USING (radacctid);
>>
>
>--
>Chris Linstruth <cjl(at)qnet(dot)com>
>QNET
>1529 East Palmdale Blvd Suite 200
>Palmdale, CA 93550
>(661) 538-2028
>
>
>On Fri, 16 May 2003, Dean Gibson (DB Administrator) wrote:
>
>> Using PostgreSQL 7.3.2 on Linux.
>>
>> One of the types of queries that I like to do on our database is:
>>
>> SELECT * FROM table1
>> WHERE indexed_column1 IN
>> (SELECT column2 FROM table2 WHERE <condition>);
>>
>> However, in our database table1 is quite large (~1M rows), and the above query takes "forever", EVEN IF table2 CONSISTS OF ONLY ONE ROW!
>>
>> However, my third-party SQL book says that the above is equivalent to:
>>
>> SELECT table1.* FROM table1, table2
>> WHERE indexed_column1 = column2 AND <condition>;
>>
>> And indeed, the above query executes virtually instantaneously if "<condition>" results in a small number of rows from table2.
>>
>> I'm fairly new to SQL; are they really equivalent? If so, is there some reason that the first form is not optimized/transformed into the second form, or is this a planned enhancement?
>>
>> -- Dean
>>
>> ps: If indexed_column1 has the same name as column2, then the query:
>>
>> SELECT table1.* from table1
>> JOIN table2 USING( column )
>> WHERE <condition>;
>>
>> Also executes quickly. I just like the readability of the first query above, but as they say, you gotta do what works ...
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faqs/FAQ.html
>>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2003-05-17 16:13:20 Re: JOIN vs. WHERE ... IN (subselect)
Previous Message Chris Linstruth 2003-05-17 14:45:59 Re: JOIN vs. WHERE ... IN (subselect)