Re: DISTINCT/Optimizer question

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Beth Jen <raelys(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: DISTINCT/Optimizer question
Date: 2006-07-08 04:21:27
Message-ID: 87fyhc681k.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> On Fri, Jul 07, 2006 at 01:25:53PM -0400, Beth Jen wrote:
> > Right now, the distinct clause adds its targets to the sort clause list when
> > it is parsed. This causes an automatic insertion of the sort node into the
> > query plan before the application of the unique node. The hash-based
> > implementation however is meant to bypass the need to sort. I could just
> > remove this action, but the optimizer should only consider using the
>
> <snip>
>
> My laymans opinion suggests that this needs a new specific "distinct
> clause" which looks a lot like a sort clause only isn't. And then in
> the planner this clause would either be converted to your new node type
> or the traditional sort node.

I had always assumed that the way forward here was just to convert the
DISTINCT query into the equivalent GROUP BY query. No sense in having two
separate code paths that handle precisely the same behaviour.

> Have you considered how your code interacts with DISTINCT ON ()?
> Perhaps a clue lies there...

Therein lies the rub. There are equivalent GROUP BY forms for DISTINCT ON
queries but they aren't optimized effectively currently. Until they are
DISTINCT ON can't be translated into GROUP BY queries. I would suggest working
on optimizing those cases (min(), max(), first(), last() with GROUP BY over a
sorted subquery) and then translating DISTINCT ON queries as well.

But afaict there's nothing stopping Postgres from converting plain old
standard DISTINCT queries into GROUP BY queries currently.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message hubert depesz lubaczewski 2006-07-08 11:35:42 Re: request for feature: psql 'DSN' option
Previous Message Martijn van Oosterhout 2006-07-07 21:18:45 Re: DISTINCT/Optimizer question