Re: How pull

From: "Matt Magoffin" <postgresql(dot)org(at)msqr(dot)us>
To: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: How pull
Date: 2007-09-24 07:37:53
Message-ID: 3658.192.168.1.65.1190619473.squirrel@msqr.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> On 23/09/2007, Matt Magoffin <postgresql(dot)org(at)msqr(dot)us> wrote:
>> Hello,
>>
>> My SQL skills are limited and I'm struggling with a query where I want
>> to
>> return a single item of an aggregate join. The query looks like this:
>>
>> select
>> (case
>> when agg.avg_rating is null then 0.0
>> when agg.avg_rating < 0.75 then 0.5
>> when agg.avg_rating < 1.25 then 1.0
>> when agg.avg_rating < 1.75 then 1.5
>> when agg.avg_rating < 2.25 then 2.0
>> when agg.avg_rating < 2.75 then 2.5
>> when agg.avg_rating < 3.25 then 3.0
>> when agg.avg_rating < 3.75 then 3.5
>> when agg.avg_rating < 4.25 then 4.0
>> when agg.avg_rating < 4.75 then 4.5
>> else 5.0
>> end) as avg_rating,
>> count(item.itemid) as item_count
>> from media_item item
>> inner join (
>> select rating.mediaitem_userrating_hjid as ritemid,
>> avg(rating.rating) as avg_rating
>> from media_item_rating rating, media_item item
>> where rating.mediaitem_userrating_hjid = item.itemid
>> group by rating.mediaitem_userrating_hjid
>> ) as agg
>> on item.itemid = agg.ritemid
>> group by avg_rating
>> order by avg_rating desc
>>
>> and a sample of results is this:
>>
>> avg_rating | item_count
>> ------------+------------
>> 5.0 | 21
>> 4.0 | 33
>> 3.0 | 13
>> 2.0 | 4
>> 1.0 | 1
>>
>> What I want as well is the ID of the item (and possibly it's avg_rating
>> value) from the "agg" join with the highest avg_rating for each output
>> row... something like this
>>
>> avg_rating | item_count | item_id | item_rating
>> ------------+-----------------------------------
>> 5.0 | 21 | 109890 | 4.9
>> 4.0 | 33 | 89201 | 4.1
>> 3.0 | 13 | 119029 | 2.8
>> 2.0 | 4 | 182999 | 2.2
>> 1.0 | 1 | 1929 | 1.0
>>
>> So the intention in this example is that item #109890 has an average
>> rating of 4.9 and that is the highest rating within the > 4.75 rating
>> group.
>>
>> If anyone had any tips I'd greatly appreciate it.
>>
>
> create ranking function to make queries look simpler:
>
> create or replace function ranking_group(numeric) returns numeric as
> $$ select case
> when $1 < 0.3456 then 'quite small'
> ...
> end $$ language sql immutable;
> (I'd make it STRICT, but you allow null rankings)
>
> 1st way: DISTINCT ON + subquery
>
> select *, (select count(*) from rating where rating_group(rating) =
> subq.rating_group ) as rating_group_size
> from (
> select distinct on (rating_group)
> rating_group(r.rating),
> r.item_id as best_rated_item_id,
> r.rating as best_rating
> from rating r
> order by rating_group desc, r.rating desc
> ) subq;
>
>
> 2nd way (faster - actually 2 x faster)
> using FIRST aggregate to calculate all in one pass
>
> create function first(numeric,numeric) returns numeric as 'select $1'
> language sql immutable strict;
> create function first(integer,integer) returns integer as 'select $1'
> language sql immutable strict;
> create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
> create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );
>
> select
> rating_group(rating),
> count(*) as num_ratings,
> first(item_id) as best_rated_item_id,
> first(rating) as best_rating
> from ( select * from rating order by rating desc ) ordered_ratings
> group by rating_group
> order by rating_group desc;
>
>
> note: if you can, get rid of null ratings. what are they supposed to
> mean? they make things a bit more complicated.

Thanks very much for the helpful tips, Filip, you understand perfectly
what I'm trying to do. For those nulls, they are not actually needed with
the query I gave, you are right. I have another version of this query that
I started with that uses a left outer join instead of the inner join to
produce another grouping for all items that have no rating set (thus the
null value).

An aggregate function like first() is precisely what I need, however I am
looking for a fairly portable solution I can also use in other databases
(Derby and MySQL at a minimum). Unfortunately I don't think either of
these suggestions will work for those systems because they don't support
DISTINCT ON or custom SQL aggregate functions as nicely as Postgres does.

Also, the "avg_rating" I'm using is itself an aggregate value, but I don't
see how your second solution handles this? The rating group values come
from the avg() calculation on a rating table, like

ratinguser | rating | item_id | idx
------------+--------+---------+-----
1098 | 2 | 11016 | 1
2 | 3 | 11016 | 0
43 | 5 | 9021 | 2
2 | 4 | 9021 | 1
1098 | 2 | 9021 | 0

So here the rating for item 11016 is 2.5 and for 9021 it is 3.66.

In addition, this is not my entire SQL statement, because I need another
join to narrow the item results first... so the full SQL I currently have
is like

select
(case
when agg.avg_rating < 0.75 then 0.5
when agg.avg_rating < 1.25 then 1.0
when agg.avg_rating < 1.75 then 1.5
when agg.avg_rating < 2.25 then 2.0
when agg.avg_rating < 2.75 then 2.5
when agg.avg_rating < 3.25 then 3.0
when agg.avg_rating < 3.75 then 3.5
when agg.avg_rating < 4.25 then 4.0
when agg.avg_rating < 4.75 then 4.5
else 5.0
end) as avg_rating,
count(item.itemid) as item_count
from media_item item
inner join (
select mi.itemid as sitemid
from media_item mi
inner join album_item ai on mi.itemid = ai.mediaitem_item_hjchildid
inner join album a on ai.album_item_hjid = a.albumid
where a.owner_ = 2 and a.allowanonymous = TRUE and a.allowbrowse = TRUE
) as shared
on item.itemid = shared.sitemid
left outer join ( -- change to inner for only items with a rating
select rating.mediaitem_userrating_hjid as ritemid,
avg(rating.rating) as avg_rating
from media_item_rating rating, media_item item
where rating.mediaitem_userrating_hjid = item.itemid
group by rating.mediaitem_userrating_hjid
) as agg
on item.itemid = agg.ritemid
group by avg_rating
order by avg_rating desc

Thanks again for the helpful tips.

In response to

  • Re: How pull at 2007-09-24 01:15:47 from Filip Rembiałkowski

Browse pgsql-sql by date

  From Date Subject
Next Message ivan marchesini 2007-09-24 12:34:46 foreign key problem
Previous Message Erik Jones 2007-09-24 04:56:24 Re: Many databases