From: | mlw <markw(at)mohawksoft(dot)com> |
---|---|
To: | Hannu Krosing <hannu(at)tm(dot)ee> |
Cc: | Hackers List <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SQL 'in' vs join. |
Date: | 2000-11-30 13:37:42 |
Message-ID: | 3A265826.B7AF2DBE@mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hannu Krosing wrote:
>
> mlw wrote:
> >
> > Why is a "select * from table1 where field in (select field from table2
> > where condition )"
> >
> > is so dramatically bad compared to:
> >
> > "select * from table1, table2 where table1.field = table2.field and
> > condition"
> >
> > I can't understand why the first query isn't optimized better than the
> > second one. The 'in' query forces a full table scan (it shouldn't) and
> > the second one uses the indexes. Does anyone know why?
>
> Its not done yet, and probably hsomewhat difficult to do in a general
> fashion
>
> > I know I am no SQL guru, but my gut tells me that the 'in' operator
> > should be far more efficient than a join.
> >
> > Here are the actual queries:
> >
> > cdinfo=# explain select trackid from zsong where muzenbr in (select
> > muzenbr from ztitles where title = 'Mulan') ;
>
> try
>
> explain
> select trackid
> from zsong
> where muzenbr in (
> select muzenbr
> from ztitles
> where title = 'Mulan'
> and ztitles.muzenbr=zsong.muzenbr
> );
>
> this should hint the current optimizer to do the right thing;
>
> -----------------
> Hannu
Nope:
cdinfo=# explain
cdinfo-# select trackid
cdinfo-# from zsong
cdinfo-# where muzenbr in (
cdinfo(# select muzenbr
cdinfo(# from ztitles
cdinfo(# where title = 'Mulan'
cdinfo(# and ztitles.muzenbr=zsong.muzenbr
cdinfo(# );
NOTICE: QUERY PLAN:
Seq Scan on zsong (cost=100000000.00..104474515.18 rows=2193213
width=4)
SubPlan
-> Index Scan using ztitles_pkey on ztitles (cost=0.00..4.05
rows=1 width=4)
But what I also find odd is, look at the components:
cdinfo=# explain select muzenbr from ztitles where title = 'Mulan' ;
NOTICE: QUERY PLAN:
Index Scan using ztitles_title_ndx on ztitles (cost=0.00..7.08 rows=1
width=4)
cdinfo=# explain select trackid from zsong where muzenbr in ( 1,2,3,4,5)
;
NOTICE: QUERY PLAN:
Index Scan using zsong_muzenbr_ndx, zsong_muzenbr_ndx,
zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx on zsong
(cost=0.00..392.66 rows=102 width=4)
Now, given the two components, each with very low costs, it chooses to
do a sequential scan on the table. I don't get it. I have have been
having no end of problems with Postgres' optimizer. It just seems to be
brain dead at times. It is a huge point of frustration to me. I am tied
to postgres in my current project, and I fear that I will not be able to
implement certain features because of this sort of behavior.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fork | 2000-11-30 13:38:32 | Re: |
Previous Message | Arno A. Karner | 2000-11-30 13:04:34 | compiling pg 7.0.3 on sco 5.0.5 |