Re: Join query on 1M row table slow

From: CSN <cool_screen_name90001(at)yahoo(dot)com>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Join query on 1M row table slow
Date: 2004-02-10 22:37:27
Message-ID: 20040210223727.29023.qmail@web40605.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


I disabled enable_hashagg and enable_nestloop. Appears
to have made both queries worse :(

QUERY 1:

=> explain analyze select p.* from product_categories
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
10000;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=65999.78..65999.78 rows=1 width=290)
(actual time=7736.029..7736.029 rows=0 loops=1)
-> Sort (cost=65997.31..65999.78 rows=986
width=290) (actual time=7723.794..7730.352 rows=2358
loops=1)
Sort Key: p.title
-> Merge Join (cost=65306.35..65948.28
rows=986 width=290) (actual time=7028.790..7614.223
rows=2358 loops=1)
Merge Cond: ("outer".product_id =
"inner".id)
-> Sort (cost=3656.31..3658.78
rows=986 width=4) (actual time=102.115..105.357
rows=2358 loops=1)
Sort Key: pc.product_id
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.349..94.173 rows=2358 loops=1)
Index Cond: (category_id =
1016)
-> Sort (cost=61650.04..61963.62
rows=125430 width=290) (actual time=6926.394..7272.130
rows=124521 loops=1)
Sort Key: p.id
-> Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.102..2855.358 rows=124753 loops=1)
Total runtime: 8003.067 ms
(13 rows)

QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categories pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10931.85..10931.91 rows=25 width=290)
(actual time=3667.396..3667.526 rows=25 loops=1)
-> Sort (cost=10931.85..10932.13 rows=111
width=290) (actual time=3667.384..3667.453 rows=25
loops=1)
Sort Key: p.title
-> Hash Join (cost=3661.52..10928.08
rows=111 width=290) (actual time=111.198..1615.324
rows=2358 loops=1)
Hash Cond: ("outer".id =
"inner".product_id)
-> Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.113..1039.900 rows=124753 loops=1)
-> Hash (cost=3661.24..3661.24
rows=111 width=4) (actual time=110.932..110.932 rows=0
loops=1)
-> Unique
(cost=3656.31..3661.24 rows=111 width=4) (actual
time=97.255..106.798 rows=2358 loops=1)
-> Sort
(cost=3656.31..3658.78 rows=986 width=4) (actual
time=97.247..99.998 rows=2358 loops=1)
Sort Key:
pc.product_id
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.327..88.436 rows=2358 loops=1)
Index Cond:
(category_id = 1016)
Total runtime: 3669.479 ms
(13 rows)

> upping the analyze target on those two tables may
> help a bit too.

How exactly do I do that?

SELECT * from thanks limit 1000
;)

CSN

--- "scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> wrote:
> Well, it looks like the number of rows estimate for
> the nested loop in the
> first query and the hash agg in the second are off
> by a factor 3 for the
> first query, and a factor of 20 for the second. Try
> running number 1 with
> set enable_nestloop = off
> and see if the first one gets faster.
>
> You might also wanna try turning off hash aggregate
> on the second one and
> see how that works.
>
> upping the analyze target on those two tables may
> help a bit too.
>
> On Tue, 10 Feb 2004, CSN wrote:
>
> >
> > Doh! Yeah, now I remember ;)
> >
> > QUERY 1:
> >
> > => explain analyze select p.* from
> product_categories
> > pc inner join products p on pc.product_id = p.id
> where
> > pc.category_id = 1016 order by p.title limit 25
> offset
> > 0;
> >
>
> > QUERY PLAN
>
> >
> >
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=9595.99..9596.05 rows=25 width=290)
> > (actual time=514.808..514.942 rows=25 loops=1)
> > -> Sort (cost=9595.99..9598.45 rows=986
> > width=290) (actual time=514.794..514.868 rows=25
> > loops=1)
> > Sort Key: p.title
> > -> Nested Loop (cost=0.00..9546.96
> rows=986
> > width=290) (actual time=0.672..421.732 rows=2358
> > loops=1)
> > -> Index Scan using
> idx_pc_category_id
> > on product_categories pc (cost=0.00..3607.28
> rows=986
> > width=4) (actual time=0.343..125.762 rows=2358
> > loops=1)
> > Index Cond: (category_id =
> 1016)
> > -> Index Scan using
> pkey_products_id
> > on products p (cost=0.00..6.01 rows=1 width=290)
> > (actual time=0.075..0.083 rows=1 loops=2358)
> > Index Cond:
> ("outer".product_id =
> > p.id)
> > Total runtime: 516.174 ms
> > (9 rows)
> >
> >
> > QUERY 2:
> >
> > => explain analyze select p.* from products p
> where
> > p.id in ( select product_id from
> product_categories pc
> > where pc.category_id = 1016) order by p.title
> limit 25
> > offset 0;
> >
>
> > QUERY PLAN
>
> >
> >
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=4282.18..4282.24 rows=25 width=290)
> > (actual time=447.852..447.979 rows=25 loops=1)
> > -> Sort (cost=4282.18..4282.46 rows=111
> > width=290) (actual time=447.836..447.904 rows=25
> > loops=1)
> > Sort Key: p.title
> > -> Nested Loop (cost=3609.75..4278.41
> > rows=111 width=290) (actual time=104.256..358.182
> > rows=2358 loops=1)
> > -> HashAggregate
> > (cost=3609.75..3609.75 rows=111 width=4) (actual
> > time=103.922..114.836 rows=2358 loops=1)
> > -> Index Scan using
> > idx_pc_category_id on product_categories pc
> > (cost=0.00..3607.28 rows=986 width=4) (actual
> > time=0.401..92.253 rows=2358 loops=1)
> > Index Cond:
> (category_id =
> > 1016)
> > -> Index Scan using
> pkey_products_id
> > on products p (cost=0.00..6.01 rows=1 width=290)
> > (actual time=0.069..0.076 rows=1 loops=2358)
> > Index Cond: (p.id =
> > "outer".product_id)
> > Total runtime: 449.370 ms
> > (10 rows)
> >
> >
> > -CSN
> >
> >
> > --- "scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> wrote:
> > > On Tue, 10 Feb 2004, CSN wrote:
> > >
> > > > > 2. Vacuum analyze the tables concerned and
> post
> > > the
> > > > > output of EXPLAIN ANALYSE
> > > > > with your query.
> > > >
> > > > => explain analyze;
> > > >
> > > > results in:
> > > >
> > > > ERROR: syntax error at or near ";" at
> character
> > > 16
> > >
> > > No silly. you do:
> > >
> > > explain analyze select ... (rest of the
> query...)
> > >
> > > and it runs the query and tells you how long
> each
> > > bit took and what it
> > > THOUGHT it would get back in terms of number of
> rows
> > > and what it actually
> > > got back.
> > >
> > > Let us know...
> > >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! Finance: Get your refund fast by filing
> online.
> > http://taxes.yahoo.com/filing.html
> >
>

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message scott.marlowe 2004-02-10 22:46:38 Re: Join query on 1M row table slow
Previous Message scott.marlowe 2004-02-10 22:21:19 Re: Join query on 1M row table slow