Quick Links

Re: query optimization: aggregate and distinct

From:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: query optimization: aggregate and distinct
Date:	2003-08-22 10:37:14
Message-ID:	200308220337.14928.jdavis-pgsql@empires.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Thursday 21 August 2003 06:36 am, Tom Lane wrote:
> Jeff Davis <jdavis-pgsql(at)empires(dot)org> writes:
> > I had an idea about using aggregates: what if I made an aggregate
> > function called "first" that just returned the value in the first tuple
> > it encountered?
>
> You could make that work in 7.4, but not in any existing releases.
>
> The trouble is that you need something like
>
> SELECT first(foo) FROM (SELECT ... ORDER BY col1,col2) ss
> GROUP BY col1
>
> and before 7.4 the optimizer doesn't realize that it can skip re-sorting
> at the outer level. So unless the sort is stable (which it won't be, on
> most platforms anyway) the needed ordering by col2 within each group is
> destroyed.
>

Interesting. It turns out I don't really need the hack because I was able to
optimize the query with some reworking and EXPLAIN ANALYZE. Now it takes
about 1 second as opposed to 5.

However, it still has me wondering what the most efficient algorithm would be.

Here is my plan:
- make a new complex type (say, most_recent_t) that's just an int and a
timestamp
- make a function to turn an int and a timestamp into a most_recent_t
- make an aggregate function that takes most_recent_t and finds the int with
the highest timestamp

I tried a preliminary version, but all the functions were in plpgsql, which I
think may have slowed it down (plus, I was using a text[] instead of a
complex type, meaning more converting). The performance was nothing great,
but it seemed like it should have been more efficient. After all, doesn't my
plan skip the sorting phase needed for DISTINCT? The main problem is that I
need to do a lot of extra aggregate calls.

Does that seem like a more efficient plan overall, or would I waste my time
writing all those functions?

regards,
jeff davis

In response to

Re: query optimization: aggregate and distinct at 2003-08-21 13:36:39 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Manfred Koizar	2003-08-22 10:53:29	Re: Buglist
Previous Message	Harald Fuchs	2003-08-22 09:59:00	Re: [GENERAL] Need concrete "Why Postgres not MySQL"