Re: Embarassing GROUP question

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org, Corey Tisdale <corey(at)eyewantmedia(dot)com>
Subject: Re: Embarassing GROUP question
Date: 2009-10-03 20:39:40
Message-ID: 8798.1254602380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
>> but looking at this example, and presuming that you find that
>> it actually does something useful, I wonder whether they interpret
>> the combination of GROUP BY and ambiguous-per-spec ORDER BY
>> in some fashion similar to DISTINCT ON.

> Yup, does look that way doesn't it. It's still a weird pair of
> semantics to conflate.

I poked around in the MySQL 5.1 manual to see if this is true.
I think it isn't --- it says very clearly here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
that you simply get an arbitrary choice among the possible values
when you reference an ambiguous column. It's possible that Corey's
query actually does give him the answers he wants, but apparently
it would be an implementation artifact that they're not promising
to maintain.

> Hum, if they were assuming that you'd always have to implement GROUP BY
> by doing a sort step first then I can see why they'd end up with this.

It's worse than that --- they actually are promising that GROUP BY
orders the results! In
http://dev.mysql.com/doc/refman/5.1/en/select.html
I find

If you use GROUP BY, output rows are sorted according to the
GROUP BY columns as if you had an ORDER BY for the same
columns. To avoid the overhead of sorting that GROUP BY
produces, add ORDER BY NULL:

SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

MySQL extends the GROUP BY clause so that you can also specify
ASC and DESC after columns named in the clause:

SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

The first of these examples implies that they allow ORDER BY to override
the default GROUP BY sorting, which would mean that the ORDER BY sort
has to happen after the GROUP BY operation, unlike the approach we take
for DISTINCT ON. So that means the ORDER BY *isn't* going to affect
which row gets chosen out of each event_type group.

What I am currently betting is that Corey's query does not really do
what he thinks it does in MySQL. It probably is selecting a random
representative row in each group and then sorting on the basis of the
event_dates in those rows.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Corey Tisdale 2009-10-03 21:56:02 Re: Embarassing GROUP question
Previous Message Tom Lane 2009-10-03 20:14:21 Re: Procedure for feature requests?