Why is not using the index

From: Luis Cornide Arce <lcornide(at)almabioinfo(dot)com>
To: PGSL-PERFORMANCE LIST <pgsql-performance(at)postgresql(dot)org>
Subject: Why is not using the index
Date: 2005-08-11 11:33:37
Message-ID: 42FB3791.2060200@almabioinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everyone,

I have some problems with a quite long query and the plan postgreSQL is
choosing. The query joins 12 tables and in the WHERE clause I use a IN
expression with a lot of identifiers (up to 2000). The problem is that
the planner is proposing a seq_scan on two tables 2M rows each
(internalexpressionprofile and expressionprofile)

I have just try this query (after doing a vacuum analyze), in the 'IN'
clause there are 1552 identifiers, and the query should return 14K rows.
I'm using a PostgreSQL 8.0.2 on a SuSE 8.1 with 1GB of RAM.

explain analyze SELECT DISTINCT rset.replicatesetid, tra.value as value,
tra.expressionprofileid, rep.*, epg.expprogeneid, con.ordinal
FROM expprogene epg JOIN reporter rep ON
(epg.reporterid=rep.reporterid), expressionprofile epro,
transformedexpressionprofile tra, internalexpressionprofile int,
meanvalues mea, replicateset rset, replicateset_condition rsco,
condition con,
"CLUSTER" clu, clustertree tre, clusteranalysis an
WHERE epg.expprogeneid IN (80174,84567,...) AND
epg.expprogeneid=epro.expprogeneid
AND epro.expressionprofileid=tra.expressionprofileid AND
tra.expressionprofileid=int.expressionprofileid
AND int.meanvaluesid=mea.meanvaluesid AND
mea.replicatesetid=rset.replicatesetid
AND rset.replicatesetid=rsco.replicatesetid AND
rsco.conditionid=con.conditionid
AND tra.clusterid=clu.clusterid AND clu.clustertreeid=tre.clustertreeid
AND tre.clustertreeid=an.genetreeid
AND an.clusteranalysisid=1 AND con.clusteranalysisid = an.clusteranalysisid
ORDER BY epg.expprogeneid, con.ordinal;

The plan...

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=129132.53..129132.59 rows=2 width=150) (actual
time=12637.224..12676.016 rows=13968 loops=1)
-> Sort (cost=129132.53..129132.54 rows=2 width=150) (actual
time=12637.217..12646.484 rows=13968 loops=1)
Sort Key: epg.expprogeneid, con.ordinal, rset.replicatesetid,
tra.value, tra.expressionprofileid, rep.reporterid, rep.name,
rep.anotation, rep.otherinfo, rep.incidences
-> Nested Loop (cost=62927.42..129132.52 rows=2 width=150)
(actual time=7112.942..12586.314 rows=13968 loops=1)
Join Filter: ("outer".genetreeid = "inner".clustertreeid)
-> Nested Loop (cost=62927.42..127893.86 rows=409
width=162) (actual time=7112.864..11960.324 rows=41904 loops=1)
-> Nested Loop (cost=62927.42..125727.31 rows=369
width=154) (actual time=7112.825..11500.645 rows=13968 loops=1)
-> Merge Join (cost=3.02..7.70 rows=1
width=12) (actual time=0.057..0.073 rows=1 loops=1)
Merge Cond: ("outer".clustertreeid =
"inner".genetreeid)
-> Index Scan using clustertree_pk on
clustertree tre (cost=0.00..4.35 rows=123 width=4) (actual
time=0.017..0.024 rows=2 loops=1)
-> Sort (cost=3.02..3.03 rows=1
width=8) (actual time=0.028..0.030 rows=1 loops=1)
Sort Key: an.genetreeid
-> Index Scan using
clusteranalysis_pk on clusteranalysis an (cost=0.00..3.01 rows=1
width=8) (actual time=0.015..0.018 rows=1 loops=1)
Index Cond:
(clusteranalysisid = 1)
-> Hash Join (cost=62924.39..125715.53
rows=408 width=150) (actual time=7112.758..11455.797 rows=13968 loops=1)
Hash Cond: ("outer".expressionprofileid
= "inner".expressionprofileid)
-> Hash Join (cost=15413.58..78079.33
rows=24339 width=134) (actual time=1489.347..5721.306 rows=41904 loops=1)
Hash Cond: ("outer".expprogeneid
= "inner".expprogeneid)
-> Seq Scan on expressionprofile
epro (cost=0.00..48263.24 rows=2831824 width=8) (actual
time=0.039..3097.656 rows=2839676 loops=1)
-> Hash
(cost=15409.72..15409.72 rows=1546 width=130) (actual
time=43.365..43.365 rows=0 loops=1)
-> Nested Loop
(cost=0.00..15409.72 rows=1546 width=130) (actual time=0.056..40.637
rows=1552 loops=1)
-> Index Scan using
expprogene_pk, expprogene_pk, [......] on expprogene epg
(cost=0.00..10698.83 rows=1546 width=8) (actual time=0.027..15.907
rows=1552 loops=1)
Index Cond:
((expprogeneid = 80174) OR (expprogeneid = 84567) OR (expprogeneid =
83608) OR [OR ....])
-> Index Scan using
reporter_pkey on reporter rep (cost=0.00..3.03 rows=1 width=126)
(actual time=0.009..0.010 rows=1 loops=1552)
Index Cond:
("outer".reporterid = rep.reporterid)
-> Hash (cost=47403.68..47403.68
rows=42853 width=16) (actual time=5623.174..5623.174 rows=0 loops=1)
-> Hash Join
(cost=2369.91..47403.68 rows=42853 width=16) (actual
time=346.040..5538.571 rows=75816 loops=1)
Hash Cond:
("outer".meanvaluesid = "inner".meanvaluesid)
-> Seq Scan on
internalexpressionprofile "int" (cost=0.00..34506.16 rows=2019816
width=8) (actual time=0.003..2231.427 rows=2019816 loops=1)
-> Hash
(cost=2262.78..2262.78 rows=42853 width=16) (actual
time=345.803..345.803 rows=0 loops=1)
-> Nested Loop
(cost=17.49..2262.78 rows=42853 width=16) (actual time=1.965..259.363
rows=75816 loops=1)
-> Hash Join
(cost=17.49..28.42 rows=6 width=16) (actual time=1.881..2.387 rows=9
loops=1)
Hash
Cond: ("outer".replicatesetid = "inner".replicatesetid)
-> Seq
Scan on replicateset rset (cost=0.00..9.58 rows=258 width=4) (actual
time=0.003..0.295 rows=258 loops=1)
-> Hash
(cost=17.47..17.47 rows=6 width=12) (actual time=1.575..1.575 rows=0
loops=1)
->
Hash Join (cost=3.17..17.47 rows=6 width=12) (actual time=0.315..1.557
rows=9 loops=1)

Hash Cond: ("outer".conditionid = "inner".conditionid)

-> Seq Scan on replicateset_condition rsco (cost=0.00..10.83 rows=683
width=8) (actual time=0.004..0.688 rows=683 loops=1)

-> Hash (cost=3.14..3.14 rows=9 width=12) (actual time=0.059..0.059
rows=0 loops=1)

-> Index Scan using clustering_analysis_fk on condition con
(cost=0.00..3.14 rows=9 width=12) (actual time=0.019..0.039 rows=9 loops=1)

Index Cond: (clusteranalysisid = 1)
-> Index Scan
using has_meanvalues_fk on meanvalues mea (cost=0.00..264.03 rows=8669
width=8) (actual time=0.027..13.032 rows=8424 loops=9)
Index
Cond: ("outer".replicatesetid = mea.replicatesetid)
-> Index Scan using comes_from_raw_fk on
transformedexpressionprofile tra (cost=0.00..5.86 rows=1 width=16)
(actual time=0.010..0.018 rows=3 loops=13968)
Index Cond: (tra.expressionprofileid =
"outer".expressionprofileid)
-> Index Scan using _cluster__pk on "CLUSTER" clu
(cost=0.00..3.01 rows=1 width=8) (actual time=0.009..0.010 rows=1
loops=41904)
Index Cond: ("outer".clusterid = clu.clusterid)
Total runtime: 12696.289 ms
(48 rows)

I tried setting the enable_seq_scan to off and the query's runtime
returned by the explain analyze is 4000ms.
Why postgre is not using the indexes?
What is the real impact of having such a big 'IN' clause?

Thanks in advance,

Luis Cornide

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2005-08-11 12:08:34 Re: Speedier count(*)
Previous Message Tino Wildenhain 2005-08-11 10:52:16 Re: Speedier count(*)