From: | Jasen Betts <jasen(at)xnet(dot)co(dot)nz> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #4916: wish: more statistical functions (median, percentiles etc) |
Date: | 2009-07-11 14:12:42 |
Message-ID: | h3a6kq$ekn$1@reversiblemaps.ath.cx |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-general |
On 2009-07-10, Richard Neill <rn214(at)cam(dot)ac(dot)uk> wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4916
> Logged by: Richard Neill
> Email address: rn214(at)cam(dot)ac(dot)uk
> PostgreSQL version: 8.4
> Operating system: Linux
> Description: wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc), it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard
percentile isn't really an agregate function as its value is mainly
determined by two (or fewer) values in the data set.
here's a function that works on arbitrarily named tables,
by building queries to get the needed statistics and then
using order by, offset, limit. Exploiting
an index (if present) to retrtieve the data values.
it won't work (well) on subqueries, but as 'col' and tbl are substitutd in
unescaped you can use joins and expressions if needed (expect a
performance hit if you do!)
-- percentile function:
-- usage example: select percentile('sometable','somecolumn','true', 0.45);
-- 'true' is a where constraint, 0.45 gets the 45th percentile.
--
-- for best preformance have an index that covers the where condition
-- and col.
CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision as $$
DECLARE
cnt integer; -- count of records
rat double precision; -- ratio
pos double precision; -- postion
res double precision; -- result
BEGIN
EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt;
pos=frc*(cnt-1);
rat=mod(pos::numeric,1);
EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )'
|| ' + (SELECT '||col||' * '||rat::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES;
RETURN res;
END;
$$ LANGUAGE PLPGSQL;
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2009-07-12 09:51:39 | Re: BUG #4916: wish: more statistical functions (median, percentiles etc) |
Previous Message | Pavel Stehule | 2009-07-11 06:34:12 | Re: BUG #4916: wish: more statistical functions (median, percentiles etc) |
From | Date | Subject | |
---|---|---|---|
Next Message | Martin Gainty | 2009-07-11 14:31:57 | Re: Using Postgres to store genetic data |
Previous Message | Raymond O'Donnell | 2009-07-11 14:03:24 | Re: Question |