From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: wip: functions median and percentile |
Date: | 2010-10-04 06:58:44 |
Message-ID: | AANLkTin7kZ1mA271=L4pemQFt2F6HRDROnkrshOZfJsd@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-rrreviewers |
On 4 October 2010 07:36, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
> 2010/10/4 Greg Stark <gsstark(at)mit(dot)edu>:
>> On Sun, Oct 3, 2010 at 7:06 AM, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
>>> And I'm now thinking about how to make median happen in window
>>> aggregate.
>>
>> If you were to do this by extending tuplesort what extra features
>> would tuplesort need?
>
> I don't think we need to extend tuplesort when we do it. IMO this can
> make happen by fetching all the values in the frame and performing
> sort once, then getting current position and calculate median. The
> only thing I consider is to mark and to go to the demanded position in
> tuplesort like tuplestore, but actually we can manage to median()
> without such facilities.
>
I dont' think that works, because the ORDER BY in the WINDOW doesn't
necessarily match the sort order that the median function needs.
Consider for example:
select a, median(a) over(order by a desc) from generate_series(1,10) as g(a);
a | median
----+--------------------
10 | 10
9 | 9.5000000000000000
8 | 9
7 | 8.5000000000000000
6 | 8
5 | 7.5000000000000000
4 | 7
3 | 6.5000000000000000
2 | 6
1 | 5.5000000000000000
(10 rows)
That requires a new sort for each row. I generated this with a minor
tweak to Pavel's patch to just restart the tuplesort each time (the
"quick-fix" solution). The problem is that performance really sucks,
because it is an O(n^2 log(n)) algorithm. I don't see an easy way
around that without significant new infrastructure, as Greg describes,
or a completely different algorithm.
>> How do existing windowed aggregates work if you specify an order by on
>> the aggregate? Do they resort for every output row?
>
Isn't the issue that they don't need to sort, so they all work unchanged.
Regards,
Dean
> I don't know the true specification of median() nor the behavior of it
> in other RDBs, but I bet median is median and ORDER BY in window
> clause doesn't affect its result. ORDER BY specifies only the frame
> boundary.
>
> Regards,
>
>
>
> --
> Hitoshi Harada
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Markus Wanner | 2010-10-04 07:03:28 | Re: is sync rep stalled? |
Previous Message | Pavel Stehule | 2010-10-04 06:52:43 | todo point: plpgsql - scrollable cursors are supported |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2010-10-04 11:46:23 | Re: [HACKERS] top-level DML under CTEs |
Previous Message | Hitoshi Harada | 2010-10-04 06:36:48 | Re: wip: functions median and percentile |