From: | Dennis Haney <davh(at)diku(dot)dk> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Recursive optimization of IN subqueries |
Date: | 2004-01-23 18:36:59 |
Message-ID: | 401169CB.5030500@diku.dk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Tom Lane wrote:
>Dennis Haney <davh(at)diku(dot)dk> writes:
>
>
>>I saw it as though convert_IN_to_join rewrote the query from
>>
>>
>>select a.* from tenk1 a where a.unique1 in
>>(select c.thousand from tenk1 c where c.hundred = 99);
>>
>>
>>to
>>
>>
>>select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND
>>c.hundred = 99;
>>
>>
>>but after looking at it, I've reached the conclusion that the rewrite is
>>to this instead:
>>
>>
>>select a.* from tenk1 a, (select d.thousand from tenk1 d where
>>d.hundred = 99) as c where a.unique1 = c.thousand;
>>
>>
>
>Right. We do that, and then subsequently pull_up_subqueries transforms
>it to the other representation. The reason for this two-step approach
>is that the intermediate form is still a useful improvement if the
>subquery can't be pulled up for some reason (e.g., it's got grouping).
>
>
With improvement I can see that it can be materialized and thus used as
a normal table in the planner. Is there any additional reasons that I
can't see?
But this limited optimization makes me wonder, why the limitation to
optimizing '='?
And why must the lefthand of the sublink be a variable of the upper query?
>>except the subselect is added as a range table entry instead of a
>>subselect in the from-list (not that I understand this particular part,
>>do you mind explaining?).
>>
>>
>
>Same thing. Every entry in the from-list will have both an RTE and an
>entry in the join tree. This representation is partly historical
>(before we had outer joins, there was only the range table and no join
>tree at all), but it is convenient for many purposes.
>
>
Then I don't understand why it gives two different execution plans? And
the Query* is totally different for the two, eg. there is no RTE for the
subquery in the first query:
davh=# explain select a.* from test1 a, (select num from test1 where id = 2) as b where a.num = b.num;
QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=4.83..29.94 rows=11 width=8)
Hash Cond: ("outer".num = "inner".num)
-> Seq Scan on test1 a (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=4.82..4.82 rows=2 width=4)
-> Index Scan using test1_pkey on test1 (cost=0.00..4.82 rows=2 width=4)
Index Cond: (id = 2)
(6 rows)
davh=# explain select a.* from test1 a where a.num in (select num from test1 where id = 2);
QUERY PLAN
------------------------------------------------------------------------------------
Hash IN Join (cost=4.83..28.75 rows=6 width=8)
Hash Cond: ("outer".num = "inner".num)
-> Seq Scan on test1 a (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=4.82..4.82 rows=2 width=4)
-> Index Scan using test1_pkey on test1 (cost=0.00..4.82 rows=2 width=4)
Index Cond: (id = 2)
(6 rows)
>PS: this is a bit off-topic for pgsql-general, please pursue it on
>-hackers if you have more questions.
ok
--
Dennis
From | Date | Subject | |
---|---|---|---|
Next Message | Reece Hart | 2004-01-23 18:48:27 | Re: psql: relocation error: psql: undefined |
Previous Message | Lee Harr | 2004-01-23 18:29:28 | Re: sequence in schema -- broken default |
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Conway | 2004-01-23 20:20:31 | Re: 7.5 change documentation |
Previous Message | Marc G. Fournier | 2004-01-23 18:04:13 | Re: [HACKERS] PostgreSQL installation CD based on |