From: | CSN <cool_screen_name90001(at)yahoo(dot)com> |
---|---|
To: | "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Join query on 1M row table slow |
Date: | 2004-02-10 21:48:20 |
Message-ID: | 20040210214820.16165.qmail@web40605.mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
--- "scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> wrote:
> On Tue, 10 Feb 2004, CSN wrote:
>
> > I have a pretty simple select query that joins a
> table
> > (p) with 125K rows with another table (pc) with
> almost
> > one million rows:
> >
> > select p.*
> > from product_categories pc
> > inner join products p
> > on pc.product_id = p.id
> > where pc.category_id = $category_id
> > order by p.title
> > limit 25
> > offset $offset
> >
> > The query usually takes about five seconds to
> execute
> > (all other PG queries perform fast enough). I have
> > indexes on everything needed, and EXPLAIN shows
> > they're being used. Is there anything else I can
> do to
> > improve performance - such as tweaking some
> settings
> > in the config?
>
> The problem is that in order to do an offset / limit
> on such a set,
> postgresql is gonna have to generate $offset + limit
> of the joined set.
>
> So, it's quite likely that it's generating the whole
> set first.
>
> It also looks odd having a select p.* from
> product_cat pc, but not
> selecting anything from the pc table.
>
> Could this be rewritten to something like
>
> select p.* from products p where p.id in (select
> product_id from product_categories pc where
> pc.category_id = $category_id)
> order by p.title limit 25 offset $offset
>
> ? Or is that equivalent?
>
I think that probably improves things (lower cost? -
see my other post):
explain select p.* from products p where p.id in (
select product_id from product_categories pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
-> Sort (cost=4282.18..4282.46 rows=111
width=290)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
Index Cond: (p.id =
"outer".product_id)
(9 rows)
I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?
(table pc is just product_id <=> category_id - I don't
really need the category_id)
CSN
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
From | Date | Subject | |
---|---|---|---|
Next Message | scott.marlowe | 2004-02-10 21:49:53 | Re: Join query on 1M row table slow |
Previous Message | scott.marlowe | 2004-02-10 21:42:23 | Re: Join query on 1M row table slow |