JOIN vs. WHERE ... IN (subselect)

From: "Dean Gibson (DB Administrator)" <dba-sql(at)ultimeth(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: JOIN vs. WHERE ... IN (subselect)
Date: 2003-05-17 06:51:20
Message-ID: 5.1.0.14.2.20030516231811.02deac70@imaps.ultimeth.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Using PostgreSQL 7.3.2 on Linux.

One of the types of queries that I like to do on our database is:

SELECT * FROM table1
WHERE indexed_column1 IN
(SELECT column2 FROM table2 WHERE <condition>);

However, in our database table1 is quite large (~1M rows), and the above query takes "forever", EVEN IF table2 CONSISTS OF ONLY ONE ROW!

However, my third-party SQL book says that the above is equivalent to:

SELECT table1.* FROM table1, table2
WHERE indexed_column1 = column2 AND <condition>;

And indeed, the above query executes virtually instantaneously if "<condition>" results in a small number of rows from table2.

I'm fairly new to SQL; are they really equivalent? If so, is there some reason that the first form is not optimized/transformed into the second form, or is this a planned enhancement?

-- Dean

ps: If indexed_column1 has the same name as column2, then the query:

SELECT table1.* from table1
JOIN table2 USING( column )
WHERE <condition>;

Also executes quickly. I just like the readability of the first query above, but as they say, you gotta do what works ...

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Chris Linstruth 2003-05-17 14:45:59 Re: JOIN vs. WHERE ... IN (subselect)
Previous Message Mike Winter 2003-05-16 19:28:31 Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR