From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Robert Poor <rdpoor(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: slow self-join query |
Date: | 2012-03-18 00:57:08 |
Message-ID: | CAOR=d=00LZM8=YCR_fqVFyKJjmidYMMLHm_v=+a0qXYfkCVkeQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sat, Mar 17, 2012 at 2:56 PM, Robert Poor <rdpoor(at)gmail(dot)com> wrote:
> Disclaimer: this is a re-post, since I wasn't subscribed the first
> time I posted. Pardon if this is a duplicate.]
>
> The following query is abysmally slow (e.g. 7 hours+). The goal is to
> find "among the users that follow user #1, who do they also follow?"
> and to count the latter.
>
> SELECT L2.followed_id as followed_id, COUNT(U1.id) AS count
> FROM users AS U1
> INNER JOIN links AS L1 ON L1.follower_id = U1.id
> INNER JOIN links AS L2 ON L2.follower_id = U1.id
> WHERE U1.type = 'User::Twitter'
> AND L1.followed_id = 1
> GROUP BY L2.followed_id
>
> Here's the rest of the info.
>
> === versions
>
> psql (9.1.2, server 8.3.14)
>
> === schema
>
> create_table "users", :force => true do |t|
> t.string "type"
> end
>
> create_table "links", :force => true do |t|
> t.integer "followed_id"
> t.integer "follower_id"
> end
>
> add_index "links", ["follower_id"], :name => "index_links_on_follower_id"
> add_index "links", ["followed_id", "follower_id"], :name =>
> "index_links_on_followed_id_and_follower_id", :unique => true
> add_index "links", ["followed_id"], :name => "index_links_on_followed_id"
>
> === # of rows
>
> users: 2,525,539
> links: 4,559,390
>
> === explain
>
> "HashAggregate (cost=490089.52..490238.78 rows=11941 width=8)"
> " -> Hash Join (cost=392604.44..483190.22 rows=1379860 width=8)"
> " Hash Cond: (f1.follower_id = u1.id)"
> " -> Bitmap Heap Scan on links f1 (cost=14589.95..55597.70
> rows=764540 width=4)"
> " Recheck Cond: (followed_id = 1)"
> " -> Bitmap Index Scan on index_links_on_followed_id
> (cost=0.00..14398.82 rows=764540 width=0)"
> " Index Cond: (followed_id = 1)"
> " -> Hash (cost=300976.98..300976.98 rows=4559881 width=12)"
> " -> Hash Join (cost=94167.40..300976.98 rows=4559881 width=12)"
> " Hash Cond: (f2.follower_id = u1.id)"
> " -> Seq Scan on links f2 (cost=0.00..77049.81
> rows=4559881 width=8)"
> " -> Hash (cost=53950.20..53950.20 rows=2526496 width=4)"
> " -> Seq Scan on users u1
> (cost=0.00..53950.20 rows=2526496 width=4)"
> " Filter: ((type)::text =
> 'User::Twitter'::text)"
>
> === other comments
>
> I'm assuming I'm doing something obviously stupid and that the above
> info will be sufficient for anyone skilled in the art to detect the
> problem. However, if needed I will gladly invest the time
> to create a subset of the data in order to run EXPLAIN ANALYZE. (With
> the whole dataset, it requires > 7 hours to complete the query. I
> don't want to go down that path again!)
Do you have an index on users.type?
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Poor | 2012-03-18 14:37:24 | Re: slow self-join query |
Previous Message | Robert Poor | 2012-03-17 20:56:02 | slow self-join query |