From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | "Dan Harris" <fbsd(at)drivefaster(dot)net>, "PostgreSQL Performance" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Planner doing seqscan before indexed join |
Date: | 2007-03-30 07:39:31 |
Message-ID: | CA896D7906BF224F8A6D74A1B7E54AB301750A13@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
You may try to change the planner's opinion using sub queries. Something
like:
select * from
eventactivity,
(select * from
keyword_incidents,
eventmain,
eventgeo
where
eventmain.incidentid = keyword_incidents.incidentid
and eventgeo.incidentid = keyword_incidents.incidentid
and ( recordtext like '%JOSE CHAVEZ%' )
)foo
where eventactivity.incidentid = foo.incidentid
order by foo.entrydate limit 10000;
HTH,
Marc
-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Dan Harris
Sent: Thursday, March 29, 2007 4:22 AM
To: PostgreSQL Performance
Subject: [PERFORM] Planner doing seqscan before indexed join
8.0.3 - Linux 2.6.18..
Freshly vacuumed and analyzed
This database has been humming along fine for a while now, but I've got
one of those sticky queries that is taking much too long to finish.
After some digging, I've found that the planner is choosing to apply a
necessary seq scan to the table. Unfortunately, it's scanning the whole
table, when it seems that it could have joined it to a smaller table
first and reduce the amount of rows it would have to scan dramatically (
70 million to about 5,000 ).
The table "eventactivity" has about 70million rows in it, index on
"incidentid"
The table "keyword_incidents" is a temporary table and has incidentid as
its primary key. It contains
5125 rows. eventmain and eventgeo both have 2.1 million. My hope is that
I can convince the planner to do the
join to keyword_incidents *first* and then do the seq scan for the
LIKE condition. Instead, it seems that it's seqscanning the whole 70
million rows first and then doing the join, which takes a lot longer
than I'd like to wait for it. Or, maybe I'm misreading the explain
output?
Thanks again
-Dan
---------------------------------
Here's the query:
explain analyze
select
* from
keyword_incidents,
eventactivity,
eventmain,
eventgeo
where
eventmain.incidentid = keyword_incidents.incidentid and
eventgeo.incidentid = keyword_incidents.incidentid and
eventactivity.incidentid = keyword_incidents.incidentid
and ( recordtext like '%JOSE CHAVEZ%' )
order by eventmain.entrydate limit 10000;
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------
Limit (cost=2388918.07..2388918.08 rows=1 width=455) (actual
time=81771.186..81771.292 rows=26 loops=1)
-> Sort (cost=2388918.07..2388918.08 rows=1 width=455) (actual
time=81771.180..81771.215 rows=26 loops=1)
Sort Key: eventmain.entrydate
-> Nested Loop (cost=0.00..2388918.06 rows=1 width=455)
(actual time=357.389..81770.982 rows=26 loops=1)
-> Nested Loop (cost=0.00..2388913.27 rows=1
width=230) (actual time=357.292..81767.385 rows=26 loops=1)
-> Nested Loop (cost=0.00..2388909.33 rows=1
width=122) (actual time=357.226..81764.501 rows=26 loops=1)
-> Seq Scan on eventactivity
(cost=0.00..2388874.46 rows=7 width=84) (actual time=357.147..81762.582
rows=27 loops=1)
Filter: ((recordtext)::text ~~ '%JOSE
CHAVEZ%'::text)
-> Index Scan using keyword_incidentid_pkey
on keyword_incidents (cost=0.00..4.97 rows=1 width=38) (actual
time=0.034..0.036 rows=1 loops=27)
Index Cond:
(("outer".incidentid)::text = (keyword_incidents.incidentid)::text)
-> Index Scan using eventgeo_incidentid_idx on
eventgeo (cost=0.00..3.93 rows=1 width=108) (actual
time=0.076..0.081 rows=1 loops=26)
Index Cond: (("outer".incidentid)::text =
(eventgeo.incidentid)::text)
-> Index Scan using eventmain_incidentid_idx on
eventmain (cost=0.00..4.78 rows=1 width=225) (actual
time=0.069..0.075 rows=1 loops=26)
Index Cond: (("outer".incidentid)::text =
(eventmain.incidentid)::text)
Total runtime: 81771.529 ms
(15 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
From | Date | Subject | |
---|---|---|---|
Next Message | Gaetano Mendola | 2007-03-30 09:32:49 | Wrong plan sequential scan instead of an index one |
Previous Message | Marc Mamin | 2007-03-30 07:07:53 | Re: Scaling SELECT:s with the number of disks on a stripe |