Re: Group By and wildcards...

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Group By and wildcards...
Date: 2005-02-19 19:04:36
Message-ID: 87650oweuz.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bruno Wolff III <bruno(at)wolff(dot)to> writes:

> On Sat, Feb 19, 2005 at 12:07:12 -0200,
> Jon Lapham <lapham(at)jandr(dot)org> wrote:
> >
> > SELECT a.*, b.*, c.*, SUM(d.blah)
> > FROM a, b, c, d
> > WHERE <some join conditions>
> > GROUP BY a.*, b.*, c.*
> >
> > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
> > explicit column names of all the column in a, b, and c.
> >
> > This becomes a maintenance nightmare as you add/drop column in these
> > tables...
>
> Don't those tables have primary keys? Grouping by the primay key of each
> table will produce the same result set as grouping by all of the columns.

Actually it would be kind of nice to have this as a feature. Or mysql's
feature of treating any unnamed columns as something like DISTINCT ON.

However there are a few approaches for dealing with it. None of which are
perfect but if they match your needs they work well.

In the query above you could turn SUM(d.blah) into a subquery expression. This
works well as long as you don't have multiple aggregate queries on the same
table.

SELECT a.*,b.*,c.*,
(SELECT sum(blah) FROM d WHERE ...) AS d_sum
FROM a,b,c

This doesn't require a GROUP BY step which means it'll probably be faster. On
the other hand it effectively forces a nested loop scan on d which is not
necessarily the fastest. And if you have multiple aggregates postgres it
forces separate lookups for the same data. It would be nice to have some
feature for breaking out subquery expressions that return multiple rows into
multiple output columns. Something like:

SELECT a.*,b.*,c.*,
(SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg)
FROM a,b,c


You could also turn the above into a more complex join like:

SELECT *
FROM a,b,c,
(SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d
WHERE ...
AND c.groupname = d.groupname

This works well as long as you didn't have the aggregate function applying to
overlapping subsets of d before. (eg, it won't work for sum(product.price) if
multiple invoices can contain the same product).

alternatively you can do something like

SELECT *
FROM a,b,c,
(select a.id as a_id, b.id as b_id, c.id as c_id,
sum(blah) as d_sum
from a,b,c,d
where ...
group by a.id,b.id,c.id
) AS sub
WHERE a.id = a_id
AND b.id = b_id
AND c.id = c_id

But that's pretty silly and not usually necessary.

--
greg

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Reuben D. Budiardja 2005-02-19 19:06:11 Re: Client lib v. 7.3 to access 8.0 db server. Compatible ?
Previous Message Oisin Glynn 2005-02-19 19:02:34 Re: Group By and wildcards...