From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | maarten <maarten(dot)foque(at)edchq(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: median for postgresql 8.3 |
Date: | 2010-11-17 15:46:19 |
Message-ID: | AANLkTim098=Y6XhOn8qfbpDKxMbCdBtWr=jRE9XKFqSQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2010/11/17 Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>:
> On 16 November 2010 17:37, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Hello
>>
>> see http://okbob.blogspot.com/2009/11/aggregate-function-median-in-postgresql.html
>>
>
> An 8.3-compatible way of doing it is:
>
> SELECT CASE WHEN c % 2 = 0 AND c > 1 THEN (a[1]+a[2])/2 ELSE a[1] END
> FROM
> (
> SELECT ARRAY(SELECT a FROM milrows ORDER BY a OFFSET (c-1)/2 LIMIT 2) AS a, c
> FROM (SELECT count(*) AS c FROM milrows) AS count
> OFFSET 0
> )
> AS midrows;
>
nice :)
Pavel
> In my tests this is faster than the analytic and array-based methods,
> but not by a huge amount.
>
> Regards,
> Dean
>
>
>
>> Regards
>>
>> Pavel Stehule
>>
>> 2010/11/16 maarten <maarten(dot)foque(at)edchq(dot)com>:
>>> Hello everyone,
>>>
>>> I was doing some analysis of data to find average delays between some
>>> timestamp values etc...
>>> When the number of rows the average is computed over is small, this can
>>> give distorted values. So I've obviously added a count column to see if
>>> the average represents much data.
>>> However, I would also like to add the median value to give me a pretty
>>> good idea of whats happening even for smaller counts.
>>>
>>> I couldn't find such an aggregate function in the manual (version 8.3)
>>> and some websearching didn't uncover it either.
>>>
>>> I was thinking about
>>> SELECT max(id) FROM test ORDER BY id ASC LIMIT
>>> (SELECT count(*)/2 FROM test)
>>>
>>> But two things are wrong with that:
>>> Limit can't use subqueries :(
>>> And ORDER BY gives me the error: 'must be used in aggregate function
>>> etc...) but I can probably work around this by using an ordered subquery
>>> in stead of the table directly.
>>>
>>> Furthermore, I need the median for a timestamp column, which would
>>> probably complicate things more than when it is a number column.
>>>
>>> I'd like to be able to do this using only the database. (So no
>>> programming functions, special addons etc...)
>>>
>>> Any ideas anyone?
>>>
>>> regards,
>>> Maarten
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-11-17 16:07:08 | Re: Survey on backing up unlogged tables: help us with PostgreSQL development! |
Previous Message | Scott Mead | 2010-11-17 15:41:11 | Re: Survey on backing up unlogged tables: help us with PostgreSQL development! |