Re: sub-query optimization

From: Brad Hilton <bhilton(at)vpop(dot)net>
To: Tomasz Myrta <jasiek(at)klaster(dot)net>
Cc: jasiek(at)serwer(dot)skawsoft(dot)com(dot)pl, pgsql-sql(at)postgresql(dot)org
Subject: Re: sub-query optimization
Date: 2003-02-17 17:20:07
Message-ID: 1045502407.7192.10.camel@aragorn.vpop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Fri, 2003-02-14 at 17:52, Tomasz Myrta wrote:
> It looks ok now. Probably it needs some cosmetics changes.
> >
> > select a.*
> > from
> > categories c cross join category_map m
> > join article_categories ac on (c.id = ac.category_id and m.child_id =
> > ac.category_id)
> > join articles a on (a.id = ac.article_id)
> > where
> > m.parent_id=1 and
> > not c.restrict_views and
> > m.child_id = c.id and
> > a.post_status='publish'
> >
> Can you send explain analyze this query? Maybe table
> joins should be reordered or they need other indexes they have?
>

Sure, I appreciate your interest and help. I modified the above query
to do a "select distinct a.*" since I need to fetch all fields and need
them grouped by id. I'm also including a query which uses derived
tables and avoids the penalty of "distinct a.*." The second query is
much faster because of this, but it's still slower than I was hoping.
~3 seconds for a query isn't going to fly. :( If you can see any
warning signs from the "explain" output, I'd love to hear from you.

Thanks,
-Brad

----------------------------------
explain analyze select distinct a.*
from
categories c cross join category_map m
join article_categories ac on (c.id = ac.category_id and m.child_id =
ac.category_id)
join articles a on (a.id = ac.article_id)
where
m.parent_id=1 and
not c.restrict_views and
m.child_id = c.id and
a.post_status='publish'

---------------------------------

Unique (cost=61058.89..68058.89 rows=20000 width=203) (actual
time=7649.35..8465.96 rows=100000 loops=1)
-> Sort (cost=61058.89..61558.89 rows=200000 width=203) (actual
time=7649.35..7898.56 rows=200000 loops=1)
Sort Key: a.id, a.user_id, a.blog_id, a.remote_ip,
a.create_time, a.publish_time, a.update_time, a.title, a.body,
a.long_body, a.excerpt, a.post_status, a.publish_date
-> Hash Join (cost=5133.25..17614.25 rows=200000 width=203)
(actual time=590.36..6029.65 rows=200000 loops=1)
Hash Cond: ("outer".article_id = "inner".id)
-> Hash Join (cost=18.33..6499.33 rows=200000 width=16)
(actual time=2.08..801.69 rows=200000 loops=1)
Hash Cond: ("outer".category_id = "inner".id)
-> Seq Scan on article_categories ac
(cost=0.00..2981.00 rows=200000 width=8) (actual time=0.01..293.28
rows=200000 loops=1)
-> Hash (cost=18.01..18.01 rows=131 width=8)
(actual time=1.72..1.72 rows=0 loops=1)
-> Hash Join (cost=6.64..18.01 rows=131
width=8) (actual time=0.84..1.58 rows=131 loops=1)
Hash Cond: ("outer".child_id =
"inner".id)
-> Seq Scan on category_map m
(cost=0.00..9.07 rows=131 width=4) (actual time=0.02..0.43 rows=131
loops=1)
Filter: (parent_id = 1)
-> Hash (cost=6.31..6.31 rows=131
width=4) (actual time=0.40..0.40 rows=0 loops=1)
-> Seq Scan on categories c
(cost=0.00..6.31 rows=131 width=4) (actual time=0.02..0.24 rows=131
loops=1)
Filter: (NOT
restrict_views)
-> Hash (cost=2885.00..2885.00 rows=100000 width=187)
(actual time=588.13..588.13 rows=0 loops=1)
-> Seq Scan on articles a (cost=0.00..2885.00
rows=100000 width=187) (actual time=0.03..429.67 rows=100000 loops=1)
Filter: (post_status = 'publish'::character
varying)
Total runtime: 18544.75 msec

--------------------------------
explain analyze select articles.* from
(select article_id from
article_categories ac, categories c, category_map cm
where
ac.category_id = c.id and
ac.category_id = cm.child_id and
c.id = cm.child_id and
c.restrict_views = FALSE and
cm.parent_id = 1
group by article_id) X
join articles on (articles.id = X.article_id)
where
articles.post_status = 'publish'

--------------------------------
Merge Join (cost=26538.07..30754.75 rows=20000 width=191) (actual
time=1736.36..3079.64 rows=100000 loops=1)
Merge Cond: ("outer".id = "inner".article_id)
-> Index Scan using articles_pkey on articles (cost=0.00..3616.68
rows=100000 width=187) (actual time=0.06..674.13 rows=100000 loops=1)
Filter: (post_status = 'publish'::character varying)
-> Sort (cost=26538.07..26588.07 rows=20000 width=16) (actual
time=1736.27..1800.42 rows=100000 loops=1)
Sort Key: x.article_id
-> Subquery Scan x (cost=24109.30..25109.30 rows=20000
width=16) (actual time=1073.54..1594.71 rows=100000 loops=1)
-> Group (cost=24109.30..25109.30 rows=20000 width=16)
(actual time=1073.54..1447.94 rows=100000 loops=1)
-> Sort (cost=24109.30..24609.30 rows=200000
width=16) (actual time=1073.52..1191.72 rows=200000 loops=1)
Sort Key: ac.article_id
-> Hash Join (cost=18.66..6499.66
rows=200000 width=16) (actual time=2.14..777.38 rows=200000 loops=1)
Hash Cond: ("outer".category_id =
"inner".id)
-> Seq Scan on article_categories ac
(cost=0.00..2981.00 rows=200000 width=8) (actual time=0.01..304.06
rows=200000 loops=1)
-> Hash (cost=18.33..18.33 rows=131
width=8) (actual time=1.80..1.80 rows=0 loops=1)
-> Hash Join (cost=6.97..18.33
rows=131 width=8) (actual time=0.78..1.65 rows=131 loops=1)
Hash Cond:
("outer".child_id = "inner".id)
-> Seq Scan on
category_map cm (cost=0.00..9.07 rows=131 width=4) (actual
time=0.01..0.54 rows=131 loops=1)
Filter: (parent_id =
1)
-> Hash (cost=6.64..6.64
rows=131 width=4) (actual time=0.39..0.39 rows=0 loops=1)
-> Seq Scan on
categories c (cost=0.00..6.64 rows=131 width=4) (actual time=0.04..0.25
rows=131 loops=1)
Filter:
(restrict_views = false)
Total runtime: 3221.05 msec

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tomasz Myrta 2003-02-17 18:11:44 Re: sub-query optimization
Previous Message Terry Yapt 2003-02-17 16:25:40 Re: rownum