Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date: 2009-12-21 10:13:02
Message-ID: 407d949e0912210213u39b857b0qa842885d715079e6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Information about count of rows are not detected in planner time. This
>  have to by available in any executor's sort node. So this value will
> be available every time - index scan is problem. Interesting is Greg's
> info about O(N) algorithms. Then we can implement median as classic
> aggregate.
>...
> On second hand - any internal implementation of median will be faster,
> then currently used techniques.

Some further information now that I'm on a real keyboard.

The O(n) algorithm for median of which I'm aware is Quickselect. It's
based on Quicksort which makes it unsuitable for a disk-based
algorithm since it would have to read every block many times. Perhaps
there's some way to adapt it or there might be some other O(n)
algorithm which has better i/o patterns.

But in cases where the tupleset/tuplesort is still in memory it would
be easy to add support for pulling out the nth element and use that in
a magic median() function. It wouldn't be a "classic" aggregate, at
least as I understand it where you also need something like O(1) space
to store the state, because you definitely need access to the whole
tupleset to do the quickselect.

If we don't find a way to optimize this for on-disk tuplestores though
then I wonder whether it's worth it. It would only help in the narrow
case that you had a large enough set to see the difference between
doing quickselect and quicksort but small enough to fit in workmem. I
suppose that case is widening over time though as memory sizes get
bigger and bigger.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Bunce 2009-12-21 10:45:31 Minimum perl version supported
Previous Message Hiroyuki Yamada 2009-12-21 09:42:16 Re: alpha3 release schedule?