From: | "Dean Gibson (DB Administrator)" <dba-sql(at)ultimeth(dot)net> |
---|---|
To: | <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: JOIN vs. WHERE ... IN (subselect) |
Date: | 2003-05-17 15:12:24 |
Message-ID: | 5.1.0.14.2.20030517080543.02deabc8@imaps.ultimeth.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
The word 'subselect' is not a keyword but just an alias; see the syntax under SELECT in the docs under 'SQL Commands' (7.3.2):
| ( select ) [ AS ] alias [ ( column_alias_list ) ]
Chris Linstruth wrote on 2003-05-17 07:45:
>I was just given the following. Not your exact query but replacing
>WHERE IN with JOIN ... AS SUBSELECT fixed my performance problems.
>This query wnet from 15-30 seconds to under a second.
>
>I have attempted to research exactly what Mr. Arnold did, and found
>nothing regarding JOIN ... AS SUBSELECT in the docs. Any pointers
>appreciated.
>
>On Sun, 11 May 2003, Mathieu Arnold wrote:
>>
>>
>> --Le 11/05/2003 09:48 -0700, Chris Linstruth \xe9crivait :
>>
>> |
>> | But this can take 15-30 seconds:
>> |
>> | SELECT count(radacctid) AS sessions,
>> | sum(acctsessiontime) AS connecttime
>> | FROM radacct
>> | WHERE radacctid IN
>> | (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
>> | WHERE username='cjl'
>> | AND acctstoptime IS NOT NULL
>> | AND date_trunc('month', now())=date_trunc('month',
>> | acctstoptime));
>> |
>> | There are probably many different ways to perform this query. My
>> | main problem is trying to overcome the fact that try as I might,
>> | I can't stop the occasional duplicate accounting record from being
>> | inserted so I have to weed them out, hence the "DISTINCT ON
>> | (acctsessionid)".
>>
>> try this :
>> SELECT count(radacctid) AS sessions,
>> sum(acctsessiontime) AS connecttime
>> FROM radacct
>> JOIN (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
>> WHERE username='cjl'
>> AND acctstoptime IS NOT NULL
>> AND date_trunc('month', now())=date_trunc('month',
>> acctstoptime)) AS subselect USING (radacctid);
>>
>
>--
>Chris Linstruth <cjl(at)qnet(dot)com>
>QNET
>1529 East Palmdale Blvd Suite 200
>Palmdale, CA 93550
>(661) 538-2028
>
>
>On Fri, 16 May 2003, Dean Gibson (DB Administrator) wrote:
>
>> Using PostgreSQL 7.3.2 on Linux.
>>
>> One of the types of queries that I like to do on our database is:
>>
>> SELECT * FROM table1
>> WHERE indexed_column1 IN
>> (SELECT column2 FROM table2 WHERE <condition>);
>>
>> However, in our database table1 is quite large (~1M rows), and the above query takes "forever", EVEN IF table2 CONSISTS OF ONLY ONE ROW!
>>
>> However, my third-party SQL book says that the above is equivalent to:
>>
>> SELECT table1.* FROM table1, table2
>> WHERE indexed_column1 = column2 AND <condition>;
>>
>> And indeed, the above query executes virtually instantaneously if "<condition>" results in a small number of rows from table2.
>>
>> I'm fairly new to SQL; are they really equivalent? If so, is there some reason that the first form is not optimized/transformed into the second form, or is this a planned enhancement?
>>
>> -- Dean
>>
>> ps: If indexed_column1 has the same name as column2, then the query:
>>
>> SELECT table1.* from table1
>> JOIN table2 USING( column )
>> WHERE <condition>;
>>
>> Also executes quickly. I just like the readability of the first query above, but as they say, you gotta do what works ...
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faqs/FAQ.html
>>
From | Date | Subject | |
---|---|---|---|
Next Message | Stephan Szabo | 2003-05-17 16:13:20 | Re: JOIN vs. WHERE ... IN (subselect) |
Previous Message | Chris Linstruth | 2003-05-17 14:45:59 | Re: JOIN vs. WHERE ... IN (subselect) |