Quick Links

Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date:	2009-12-21 05:48:15
Message-ID:	162867790912202148j3381f9a2k52a3b697ab2201d8@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2009/12/20 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
>>>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>
> > 2009/12/20 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> >> I think that we've already expanded the capabilities of aggregates
> >> a great deal for 8.5, and we should let it sit as-is for a release
> >> or two and see what the real user demand is for additional
> >> features.
> >>
> >> I'm particularly concerned by the fact that the feature set is
> >> already far out in front of what the planner can optimize
> >> effectively (e.g., there's no ability to combine the work when
> >> multiple aggregates need the same sorted data). The more features
> >> we add on speculation, the harder it's going to be to close that
> >> gap.
>
> I absolutely agree with Tom here and for some quite specific reasons.
>
> An optimal (or at least more optimal than is currently possible)
> implementation of median() on top of the ordered-agg code as it stands
> requires additions to the aggregate function interface: the median agg
> implementation would have to, as a minimum, know how many rows of
> sorted input are available. In addition, it would be desirable for it
> to have direct (and possibly bidirectional) access to the tuplesort.

I was thinking about some new kind of aggregates based on tuple-store.

>
> Now, if we look at how ordered aggs ought to be optimized, it's clear
> that the planner should take the ordering costs into account and
> consider plans that order the input instead. Once you do this, then
> there's no longer any pre-computed count or tuplesort object
> available, so if you'd implemented a better median() before the
> optimizations, you'd end up either having to forgo the optimization or
> break the median() agg again; clearly not something we want.
>

Information about count of rows are not detected in planner time. This
have to by available in any executor's sort node. So this value will
be available every time - index scan is problem. Interesting is Greg's
info about O(N) algorithms. Then we can implement median as classic
aggregate.

> Plus, a feature that I intentionally omitted from the ordered-aggs
> patch is the ability to do ordered-aggs as window functions. This is
> in the spec, but (a) there were conflicting patches for window
> functions on the table and (b) in my opinion, much of the work needed
> to implement ordered-agg-as-window-func in an effective manner is
> dependent on doing more work on optimization first (or at least will
> potentially become easier as a result of that work).
>

I fully agree, so agg(x order by x) needs some work more - so general
design is premature.

On second hand - any internal implementation of median will be faster,
then currently used techniques.

Pavel

> So I think both the optimization issue and the window functions issue
> would be best addressed before trying to build any additional features
> on top of what we have so far.
>
> --
> Andrew (irc:RhodiumToad)
>

In response to

Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support at 2009-12-20 22:48:33 from Andrew Gierth

Responses

Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support at 2009-12-21 10:13:02 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2009-12-21 06:03:18	Re: using separate parameters in psql query execution
Previous Message	Tom Lane	2009-12-21 05:39:16	Re: Possible patch for better index name choosing