Re: Help with a seq scan on multi-million row table

From: <ogjunk-pgjedan(at)yahoo(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Help with a seq scan on multi-million row table
Date: 2006-05-11 01:34:10
Message-ID: 20060511013410.83818.qmail@web50307.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Aha! set hashjoin=off did the trick.
The PG version is: 8.0.3

NB: I removed that redundant "DISTINCT" after the SELECT.

EXPLAIN ANALYZE select userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url userurl1_ where (((userurl1_.user_id=1 ))AND((userurltag0_.user_url_id=userurl1_.id ))) group by userurltag0_.tag order by count(*)DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=155766.79..155774.81 rows=3207 width=10) (actual time=2387.756..2396.578 rows=2546 loops=1)
Sort Key: count(*)
-> HashAggregate (cost=155572.02..155580.03 rows=3207 width=10) (actual time=2365.643..2376.626 rows=2546 loops=1)
-> Nested Loop (cost=0.00..155552.68 rows=3867 width=10) (actual time=0.135..2222.028 rows=8544 loops=1)
-> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2798.12 rows=963 width=4) (actual time=0.067..9.744 rows=1666 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using ix_user_url_tag_user_url_id on user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual time=1.223..1.281 rows=5 loops=1666)
Index Cond: (userurltag0_.user_url_id = "outer".id)
Total runtime: 2405.691 ms
(9 rows)

Are you still interested in other "its second-choice join type"? If you are, please tell me what join types those are, this is a bit beyond me. :(

Is there a way to force PG to use the index automatically? This query is executed from something called Hibernate, and I'm not sure if that will let me set enable_hashjoin=off through its API...

Thanks,
Otis

----- Original Message ----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ogjunk-pgjedan(at)yahoo(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Sent: Wednesday, May 10, 2006 8:27:01 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table

<ogjunk-pgjedan(at)yahoo(dot)com> writes:
> -> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) (actual time=248.530..380635.132 rows=8544 loops=1)
> Hash Cond: ("outer".user_url_id = "inner".id)
> -> Seq Scan on user_url_tag userurltag0_ (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 rows=6259553 loops=1)
> -> Hash (cost=2795.24..2795.24 rows=962 width=4) (actual time=199.840..199.840 rows=0 loops=1)
> -> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual time=0.048..193.707 rows=1666 loops=1)
> Index Cond: (user_id = 1)

Hm, I'm not sure why it's choosing that join plan. A nestloop indexscan
wouldn't be terribly cheap, but just counting on my fingers it seems
like it ought to come in at less than 100000 cost units. What do you
get if you set enable_hashjoin off? (Then try disabling its
second-choice join type too --- I'm interested to see EXPLAIN ANALYZE
output for all three join types.)

What PG version is this exactly?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2006-05-11 01:53:49 Re: Help with a seq scan on multi-million row table
Previous Message Tom Lane 2006-05-11 00:27:01 Re: Help with a seq scan on multi-million row table