Help with a seq scan on multi-million row table

From: <ogjunk-pgjedan(at)yahoo(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Help with a seq scan on multi-million row table
Date: 2006-05-10 18:13:59
Message-ID: 20060510181359.58201.qmail@web50306.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential scan on a multi-million row table. I _thought_ I had all the appropriate indices, but apparently I do not. I was wondering if anyone can spot a way I can speed up this query.
The query currently takes... *gulp*: 381119.201 ms :(

There are only 2 tables in the game: user_url and user_url_tag. The latter has FKs pointing to the former. The sequential scan happens on the latter - user_url_tag:

EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url userurl1_ WHERE (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) GROUP BY userurltag0_.tag ORDER BY count(*) DESC;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=140972.22..140996.28 rows=3207 width=10) (actual time=381082.868..381110.094 rows=2546 loops=1)
-> Sort (cost=140972.22..140980.24 rows=3207 width=10) (actual time=381082.858..381091.733 rows=2546 loops=1)
Sort Key: count(*), userurltag0_.tag
-> HashAggregate (cost=140777.45..140785.46 rows=3207 width=10) (actual time=381032.844..381064.068 rows=2546 loops=1)
-> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) (actual time=248.530..380635.132 rows=8544 loops=1)
Hash Cond: ("outer".user_url_id = "inner".id)
-> Seq Scan on user_url_tag userurltag0_ (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 rows=6259553 loops=1)
-> Hash (cost=2795.24..2795.24 rows=962 width=4) (actual time=199.840..199.840 rows=0 loops=1)
-> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual time=0.048..193.707 rows=1666 loops=1)
Index Cond: (user_id = 1)
Total runtime: 381119.201 ms
(11 rows)


This is what the two tables look like (extra colums removed):

Table "public.user_url_tag"
Column | Type | Modifiers
-------------+-----------------------+--------------------------------------------------------------
id | integer | not null default nextval('public.user_url_tag_id_seq'::text)
user_url_id | integer |
tag | character varying(64) |
Indexes:
"pk_user_url_tag_id" PRIMARY KEY, btree (id)
"ix_user_url_tag_tag" btree (tag)
"ix_user_url_tag_user_url_id" btree (user_url_id)
Foreign-key constraints:
"fk_user_url_tag_user_url_id" FOREIGN KEY (user_url_id) REFERENCES user_url(id)

Table "public.user_url"
Column | Type | Modifiers
------------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('public.user_url_id_seq'::text)
user_id | integer |
url_id | integer |
Indexes:
"pk_user_url_id" PRIMARY KEY, btree (id)
"ix_user_url_url_id_user_id" UNIQUE, btree (url_id, user_id)
"ix_user_url_user_id_url_id" UNIQUE, btree (user_id, url_id)


Does anyone see a way to speed up this s-l-o-w query?
I cache DB results, but I'd love to get rid of that sequential scan.

Thanks,
Otis

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Kretschmer 2006-05-10 18:32:35 Re: Multi-column index not used, new flipped column index is
Previous Message Markus Schaber 2006-05-10 16:46:10 Re: Multi-column index not used, new flipped column index is