Re: Need help optimizing this query

From: "Pat Maddox" <pergesu(at)gmail(dot)com>
To: "Michael Glaesemann" <grzm(at)seespotcode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need help optimizing this query
Date: 2007-07-18 21:48:35
Message-ID: 810a540e0707181448y135015edt4a861560db370f58@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 7/18/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
>
> On Jul 18, 2007, at 16:12 , Pat Maddox wrote:
>
> > ERROR: invalid reference to FROM-clause entry for table "video_views"
> > LINE 20: JOIN assets ON (video_views.video_id=videos.id)
> > ^
> > HINT: There is an entry for table "video_views", but it cannot be
> > referenced from this part of the query.
>
> It's because I mismatched the JOIN clauses during my copy-and-paste :(
>
> > On 7/18/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
>
> >> FROM video_views
> >> JOIN assets ON (video_views.video_id=videos.id)
> >> JOIN videos ON (video_views.asset_id=assets.id)
>
> This should be
>
> FROM video_views
> JOIN assets ON (video_views.asset_id=assets.id)
> JOIN videos ON (video_views.video_id=videos.id)
>
> Do you have the EXPLAIN ANALYE output of the query?
>
> Michael Glaesemann
> grzm seespotcode net
>
>
>

For some reason the functions you wrote are giving me trouble (there's
a BIGINT involved, I tried changing the functions around but kept
having issues). So here's the full query, hopefully formatted better:

SELECT
SUM(CASE WHEN (hit IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
END) AS count_hits_console,
SUM(CASE WHEN (hit IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0
END) AS count_hits_remote,
SUM(CASE WHEN (played IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
END) AS count_played_console,
SUM(CASE WHEN (played IS TRUE AND logged_in IS FALSE) THEN 1 ELSE
0 END) AS count_played_remote,
SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS TRUE) THEN 1
ELSE 0 END) AS count_downloaded_console,
SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS FALSE) THEN 1
ELSE 0 END) AS count_downloaded_remote,
SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
logged_in IS TRUE) THEN assets.size ELSE 0 END) as
download_size_console,
SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
logged_in IS FALSE) THEN assets.size ELSE 0 END) AS
download_size_remote,
videos.id,
videos.title,
videos.guid
FROM video_views
JOIN assets ON (video_views.asset_id=assets.id)
JOIN videos on (video_views.video_id=videos.id)
WHERE videos.company_id=1
GROUP BY videos.id,
videos.title,
videos.guid
ORDER BY count_hits_remote DESC
LIMIT 100

and here's the EXPLAIN ANALYZE output:

Limit (cost=127072.90..127073.12 rows=87 width=64) (actual
time=2636.560..2636.567 rows=20 loops=1)
-> Sort (cost=127072.90..127073.12 rows=87 width=64) (actual
time=2636.558..2636.562 rows=20 loops=1)
Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND
(video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END)
-> HashAggregate (cost=127067.49..127070.10 rows=87
width=64) (actual time=2636.481..2636.506 rows=20 loops=1)
-> Hash Join (cost=880.96..125995.46 rows=38983
width=64) (actual time=24.904..2635.719 rows=122 loops=1)
Hash Cond: (video_views.asset_id = assets.id)
-> Hash Join (cost=195.96..124433.01 rows=39009
width=60) (actual time=8.327..2618.982 rows=122 loops=1)
Hash Cond: (video_views.video_id = videos.id)
-> Seq Scan on video_views
(cost=0.00..101352.70 rows=5998470 width=12) (actual
time=0.031..1410.231 rows=5998341 loops=1)
-> Hash (cost=194.87..194.87 rows=87
width=52) (actual time=1.001..1.001 rows=90 loops=1)
-> Bitmap Heap Scan on videos
(cost=4.93..194.87 rows=87 width=52) (actual time=0.111..0.840 rows=90
loops=1)
Recheck Cond: (company_id = 1)
-> Bitmap Index Scan on
index_videos_on_company_id (cost=0.00..4.90 rows=87 width=0) (actual
time=0.079..0.079 rows=90 loops=1)
Index Cond: (company_id = 1)
-> Hash (cost=487.78..487.78 rows=15778
width=12) (actual time=16.527..16.527 rows=15778 loops=1)
-> Seq Scan on assets (cost=0.00..487.78
rows=15778 width=12) (actual time=0.023..9.601 rows=15778 loops=1)
Total runtime: 2637.043 ms
(17 rows)

That one runs reasonably fine, because there are only 20 videos being
returned and a handful of video views associated with them. In the
real query there are about 1k videos and a couple million views. That
took about 80 minutes to run, according to logs.

Pat

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2007-07-18 22:02:51 Re: Feature request: Per database search_path
Previous Message Alan Hodgson 2007-07-18 21:34:46 Re: DBI/DBD::Pg and transactions