Re: GROUP BY requirement

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: GROUP BY requirement
Date: 2005-08-30 18:35:25
Message-ID: 1125426924.28179.78.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2005-08-26 at 14:39, Bill Moseley wrote:
> I'm wondering if adding a GROUP BY (as required by Postgres) will
> change the results of a select on a view.
>
> I have the following view which joins a "class" with a teacher. A
> teacher is a "person" and I have an "instructors" link table.
>
> CREATE VIEW class_list (id, class_time, instructor )
> AS
> SELECT DISTINCT ON(class.id)
> class.id, class.class_time, person.first_name
>
> FROM class, instructors, person
> WHERE instructors.person = person.id
> AND class.id = instructors.class;
>
> I also have a table "registration" that links students with a class.
> The registration table has a "reg_status" column to say if they are
> confirmed or on the wait_list. So when showing the above I'd also
> like to see how many students are confirmed and on the wait_list.
>
> DROP VIEW cl;
> CREATE VIEW cl (id, class_time, instructor,
> confirmed_cnt, wait_list_cnt)
> AS
> SELECT DISTINCT ON(class.id)
> class.id, class.class_time, person.first_name,
> sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
> sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,
>
> FROM class, instructors, person, registration
> WHERE instructors.person = person.id
> AND class.id = instructors.class
> AND class.id = registration.class
>
> GROUP BY class.id, class.class_time, person.first_name;
>
> PostgreSQL requires the GROUP BY. But, I'm not clear how the GROUP BY
> might change the results between the two views above.
>
> http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY
>
> says:
>
> When GROUP BY is present, it is not valid for the SELECT list
> expressions to refer to ungrouped columns except within aggregate
> functions, since there would be more than one possible value to
> return for an ungrouped column.
>
> Frankly, I cannot see how it might change results of a select between
> the two views. Am I missing something?

OK, distinct on suffers from this problem. Given the following simple
dataset:

mytable:
a | b
------
1 | 0
1 | 1

select distinct on (a) a,b from mytable;

One can see how the possible results are:

1,0 and 1,1, right? All depending on the order in which they are
fetched.

The same would be true if you could do a group by on a and select b:

select a,b from mytable group by a;

Right?

Now, if it's impossible for your dataset to return such sets, due to the
way it's built, it is likely not fully normalized. I.e. you have data
like this:

classid | instructorname | moreinfo...
--------------------------------------
1 | 'John Smith' | 'information'
1 | 'John Smith' | 'even more information'

and so on. Or your join is creating such a data set.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2005-08-30 18:48:35 Re: POSS. FEATURE REQ: "Dynamic" Views
Previous Message Andrew Sullivan 2005-08-30 17:56:25 Re: Postgresql replication