From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support |
Date: | 2009-12-21 11:43:24 |
Message-ID: | 162867790912210343r9ce2d53s51616392c3003042@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2009/12/21 Greg Stark <gsstark(at)mit(dot)edu>:
> On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Information about count of rows are not detected in planner time. This
>> have to by available in any executor's sort node. So this value will
>> be available every time - index scan is problem. Interesting is Greg's
>> info about O(N) algorithms. Then we can implement median as classic
>> aggregate.
>>...
>> On second hand - any internal implementation of median will be faster,
>> then currently used techniques.
>
> Some further information now that I'm on a real keyboard.
>
> The O(n) algorithm for median of which I'm aware is Quickselect. It's
> based on Quicksort which makes it unsuitable for a disk-based
> algorithm since it would have to read every block many times. Perhaps
> there's some way to adapt it or there might be some other O(n)
> algorithm which has better i/o patterns.
>
> But in cases where the tupleset/tuplesort is still in memory it would
> be easy to add support for pulling out the nth element and use that in
> a magic median() function. It wouldn't be a "classic" aggregate, at
> least as I understand it where you also need something like O(1) space
> to store the state, because you definitely need access to the whole
> tupleset to do the quickselect.
>
I can check speed diferences on orafce's median implementation. But
orafce isn't correct - it ignores work_mem limit - so it usable only
for some external modules or for testing. I'll try simple median
implementation based on aggregate API. Then I'll compare speed.
> If we don't find a way to optimize this for on-disk tuplestores though
> then I wonder whether it's worth it. It would only help in the narrow
> case that you had a large enough set to see the difference between
> doing quickselect and quicksort but small enough to fit in workmem. I
> suppose that case is widening over time though as memory sizes get
> bigger and bigger.
Thank you
I have to do same test and get more info about quickselect
Pavel
>
> --
> greg
>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-12-21 11:56:13 | Re: using separate parameters in psql query execution |
Previous Message | Simon Riggs | 2009-12-21 11:34:01 | Re: alpha3 release schedule? |