Re: Need help optimizing this query

From: "Pat Maddox" <pergesu(at)gmail(dot)com>
To: "Michael Glaesemann" <grzm(at)seespotcode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need help optimizing this query
Date: 2007-07-18 22:34:37
Message-ID: 810a540e0707181534w267ae41au570ed0ca1a4edf84@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 7/18/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
>
> On Jul 18, 2007, at 16:48 , Pat Maddox wrote:
>
> > For some reason the functions you wrote are giving me trouble (there's
> > a BIGINT involved, I tried changing the functions around but kept
> > having issues).
>
> You might try these, if you're interested.
>
> CREATE OR REPLACE FUNCTION ternary(BOOLEAN, BIGINT, BIGINT)
> RETURNS BIGINT
> LANGUAGE SQL AS $_$SELECT CASE WHEN $1 THEN $2 ELSE $3 END$_$;
>
> CREATE OR REPLACE FUNCTION value_when(BOOLEAN, BIGINT)
> RETURNS BIGINT
> LANGUAGE SQL AS $_$SELECT ternary($1,$2,0)$_$;
>
> CREATE OR REPLACE FUNCTION one_when(BOOLEAN)
> RETURNS BIGINT
> LANGUAGE SQL as $_$SELECT value_when($1,1)$_$;
>
> > So here's the full query, hopefully formatted better:
>
> I'm still curious about why the planner is choosing a hash join over
> using the indexes on the foreign keys, but that might be because the
> tables are relatively small.
>
> > That one runs reasonably fine, because there are only 20 videos being
> > returned and a handful of video views associated with them. In the
> > real query there are about 1k videos and a couple million views. That
> > took about 80 minutes to run, according to logs.
>
> The planner will choose different plans based on, among other things,
> what it estimates the size of the result to be, so while looking at a
> small example query might seem like a way to go about looking at
> what's going on, it's most likely not going to give you an accurate
> representation of the situation. Are you looking at two different
> systems (e.g., a development system versus a production system) or
> just choosing a smaller query on the same system? If you can't run
> the query on your production system, you may want to take a dump of
> the production system and set it up on another box. Even with a
> couple million rows in the video_views table, PostgreSQL shouldn't
> really blink too much, as long as the server is tuned properly, the
> hardware is adequate, and the database statistics are up to date.
>
> Michael Glaesemann
> grzm seespotcode net
>
>
>
>

Sorry, I mentioned that it took 90 seconds to run the query but I
didn't show that EXPLAIN ANALYZE output.

Here it is, same query just with a different company_id:

Limit (cost=879283.07..879283.32 rows=100 width=64) (actual
time=92486.858..92486.891 rows=100 loops=1)
-> Sort (cost=879283.07..879297.15 rows=5632 width=64) (actual
time=92486.856..92486.867 rows=100 loops=1)
Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND
(video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END)
-> GroupAggregate (cost=803054.95..878932.21 rows=5632
width=64) (actual time=67145.471..92484.408 rows=730 loops=1)
-> Sort (cost=803054.95..809363.98 rows=2523610
width=64) (actual time=67076.407..75441.274 rows=5799447 loops=1)
Sort Key: videos.id, videos.title, videos.guid
-> Hash Join (cost=1220.63..237115.16
rows=2523610 width=64) (actual time=31.230..11507.406 rows=5799447
loops=1)
Hash Cond: (video_views.asset_id = assets.id)
-> Hash Join (cost=535.62..179627.88
rows=2525294 width=60) (actual time=13.286..7621.950 rows=5799447
loops=1)
Hash Cond: (video_views.video_id = videos.id)
-> Seq Scan on video_views
(cost=0.00..101352.70 rows=5998470 width=12) (actual
time=0.023..2840.718 rows=5998341 loops=1)
-> Hash (cost=465.23..465.23
rows=5632 width=52) (actual time=13.216..13.216 rows=5712 loops=1)
-> Seq Scan on videos
(cost=0.00..465.23 rows=5632 width=52) (actual time=0.038..9.060
rows=5712 loops=1)
Filter: (company_id = 11)
-> Hash (cost=487.78..487.78 rows=15778
width=12) (actual time=17.876..17.876 rows=15778 loops=1)
-> Seq Scan on assets
(cost=0.00..487.78 rows=15778 width=12) (actual time=0.032..10.880
rows=15778 loops=1)
Total runtime: 92548.006 ms
(17 rows)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pg Coder 2007-07-18 22:43:35 IN clause performance
Previous Message Tom Lane 2007-07-18 22:24:41 Re: Update of table lags execution of statement by >1 minute?