From: | Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com> |
---|---|
To: | "Matt Magoffin" <postgresql(dot)org(at)msqr(dot)us> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: How pull |
Date: | 2007-09-24 01:15:47 |
Message-ID: | 92869e660709231815pe28db6cp6d1f132f16f2c181@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On 23/09/2007, Matt Magoffin <postgresql(dot)org(at)msqr(dot)us> wrote:
> Hello,
>
> My SQL skills are limited and I'm struggling with a query where I want to
> return a single item of an aggregate join. The query looks like this:
>
> select
> (case
> when agg.avg_rating is null then 0.0
> when agg.avg_rating < 0.75 then 0.5
> when agg.avg_rating < 1.25 then 1.0
> when agg.avg_rating < 1.75 then 1.5
> when agg.avg_rating < 2.25 then 2.0
> when agg.avg_rating < 2.75 then 2.5
> when agg.avg_rating < 3.25 then 3.0
> when agg.avg_rating < 3.75 then 3.5
> when agg.avg_rating < 4.25 then 4.0
> when agg.avg_rating < 4.75 then 4.5
> else 5.0
> end) as avg_rating,
> count(item.itemid) as item_count
> from media_item item
> inner join (
> select rating.mediaitem_userrating_hjid as ritemid,
> avg(rating.rating) as avg_rating
> from media_item_rating rating, media_item item
> where rating.mediaitem_userrating_hjid = item.itemid
> group by rating.mediaitem_userrating_hjid
> ) as agg
> on item.itemid = agg.ritemid
> group by avg_rating
> order by avg_rating desc
>
> and a sample of results is this:
>
> avg_rating | item_count
> ------------+------------
> 5.0 | 21
> 4.0 | 33
> 3.0 | 13
> 2.0 | 4
> 1.0 | 1
>
> What I want as well is the ID of the item (and possibly it's avg_rating
> value) from the "agg" join with the highest avg_rating for each output
> row... something like this
>
> avg_rating | item_count | item_id | item_rating
> ------------+-----------------------------------
> 5.0 | 21 | 109890 | 4.9
> 4.0 | 33 | 89201 | 4.1
> 3.0 | 13 | 119029 | 2.8
> 2.0 | 4 | 182999 | 2.2
> 1.0 | 1 | 1929 | 1.0
>
> So the intention in this example is that item #109890 has an average
> rating of 4.9 and that is the highest rating within the > 4.75 rating
> group.
>
> If anyone had any tips I'd greatly appreciate it.
>
create ranking function to make queries look simpler:
create or replace function ranking_group(numeric) returns numeric as
$$ select case
when $1 < 0.3456 then 'quite small'
...
end $$ language sql immutable;
(I'd make it STRICT, but you allow null rankings)
1st way: DISTINCT ON + subquery
select *, (select count(*) from rating where rating_group(rating) =
subq.rating_group ) as rating_group_size
from (
select distinct on (rating_group)
rating_group(r.rating),
r.item_id as best_rated_item_id,
r.rating as best_rating
from rating r
order by rating_group desc, r.rating desc
) subq;
2nd way (faster - actually 2 x faster)
using FIRST aggregate to calculate all in one pass
create function first(numeric,numeric) returns numeric as 'select $1'
language sql immutable strict;
create function first(integer,integer) returns integer as 'select $1'
language sql immutable strict;
create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );
select
rating_group(rating),
count(*) as num_ratings,
first(item_id) as best_rated_item_id,
first(rating) as best_rating
from ( select * from rating order by rating desc ) ordered_ratings
group by rating_group
order by rating_group desc;
note: if you can, get rid of null ratings. what are they supposed to
mean? they make things a bit more complicated.
--
Filip Rembiałkowski
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Jones | 2007-09-24 04:56:24 | Re: Many databases |
Previous Message | Matt Magoffin | 2007-09-23 22:26:36 | How pull |