Re: cross tables, SELECT expressions, and GROUP BY problem

From: Ross Johnson <ross(dot)johnson(at)homemail(dot)com(dot)au>
To: PostgreSQL SQL List <pgsql-sql(at)postgresql(dot)org>
Subject: Re: cross tables, SELECT expressions, and GROUP BY problem
Date: 2006-04-02 12:15:26
Message-ID: 1143980127.8841.254.camel@desk.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Following up my own question again, I've realised my error and solved my
problem - in the interests of completing this thread, the working query
(two versions) can be found below...

On Sun, 2006-04-02 at 18:00 +1000, Ross Johnson wrote:
> On Sun, 2006-04-02 at 16:43 +1000, Ross Johnson wrote:
> > Hi,
> >
> > I'm relatively stretched when it comes to SQL but I'm getting there. I
> > have a question though:
> >
> > Thanks to various web sites I've succeeded in creating simple cross
> > tables in PostgreSQL. I'm now trying more complicated examples
> > (migrating queries from MS Access to PostgreSQL). I'm stuck on getting
> > grouping to work where the selection element isn't a real table field,
> > where it is generated by a CASE statement. Here's a full actual SELECT
> > statement, that works but isn't quite what I need, followed by the first
> > few rows of output:
> >
> > SELECT DISTINCT
> > CASE
> > WHEN lower(c."Order") = 'coleoptera' THEN 5
> > WHEN lower(c."Order") = 'trichoptera' THEN 8
> > WHEN lower(c."Order") = 'ephemeroptera' THEN 6
> > WHEN lower(c."Order") = 'plecoptera' THEN 7
> > WHEN lower(c."Class") = 'oligochaeta' THEN 1
> > WHEN lower(c."Family") LIKE 'chiron%' THEN 2
> > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3
> > ELSE 4
> > END AS "Ranking",
> > CASE
> > WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera'
> > WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'
> > WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera'
> > WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'
> > WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta'
> > WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'
> > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
> > ELSE 'Other'
> > END AS "Taxa",
> > SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
> > SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
> > SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
> > SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
> > FROM "tblBugIDSheetInfo" b
> > INNER JOIN "tblBugCount" c USING ("BugSheetID")
> > GROUP BY c."Order", c."Class", c."Family"
> > ORDER BY "Ranking"
> >
> > Generates the following output:
> >
> > Ranking Taxa 2222011 2222012 2222013 2222014
> > ---------------------------------------------------------------
> > 1 "Oligochaeta" 487 1711 1759 1078
> > 1 "Oligochaeta" 7
> > 1 "Oligochaeta"
> > 2 "Chironomidae" 1385 2335 1500 1513
> > 2 "Chironomidae"
> > 3 "Diptera (Other)" 5
> > 3 "Diptera (Other)" 1 1 3
> > 3 "Diptera (Other)" 199 19 40 37
> > 3 "Diptera (Other)"
> > ...
>
> I should add that I've just tried the following query, which is just a
> rearrangement of the above query using a sub-SELECT, and the result is
> the same as above apart from a slightly different ordering of the rows.
> That is, GROUP BY "Ranking" still doesn't appear to do anything. I was
> under the impression that a sub-SELECT creates a temporary, or at least
> a pseudo-temporary table (a "virtual" table as it's called in the
> documentation), so I would have expected this query to work even if the
> above one doesn't.
>
>
> SELECT DISTINCT
> "Ranking", "Taxa",
> SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
> SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
> SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
> SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
> FROM "tblBugIDSheetInfo" b
> INNER JOIN (
> SELECT "BugSheetID",
> CASE
> WHEN lower("Order") = 'coleoptera' THEN 5
> WHEN lower("Order") = 'trichoptera' THEN 8
> WHEN lower("Order") = 'ephemeroptera' THEN 6
> WHEN lower("Order") = 'plecoptera' THEN 7
> WHEN lower("Class") = 'oligochaeta' THEN 1
> WHEN lower("Family") LIKE 'chiron%' THEN 2
> WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3
> ELSE 4
> END AS "Ranking",
> CASE
> WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera'
> WHEN lower("Order") = 'trichoptera' THEN 'Trichoptera'
> WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera'
> WHEN lower("Order") = 'plecoptera' THEN 'Plecoptera'
> WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta'
> WHEN lower("Family") LIKE 'chiron%' THEN 'Chironomidae'
> WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
> ELSE 'Other'
> END AS "Taxa",
> "Order", "Class", "Family", "Count"
> FROM "tblBugCount") c USING ("BugSheetID")
> GROUP BY c."Ranking", c."Taxa", c."Order", c."Class", c."Family"
> ORDER BY "Ranking"
>

The problem was I was trying to do too much in one GROUP BY clause.
After getting the right result using a temporary table I backtracked and
found that the following version gives me the result I'm looking for.

SELECT DISTINCT
"Ranking", "Taxa",
SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
FROM "tblBugIDSheetInfo" b
INNER JOIN (
SELECT "BugSheetID",
CASE
WHEN lower("Order") = 'coleoptera' THEN 5
WHEN lower("Order") = 'trichoptera' THEN 8
WHEN lower("Order") = 'ephemeroptera' THEN 6
WHEN lower("Order") = 'plecoptera' THEN 7
WHEN lower("Class") = 'oligochaeta' THEN 1
WHEN lower("Family") LIKE 'chiron%' THEN 2
WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3
ELSE 4
END AS "Ranking",
CASE
WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera'
WHEN lower("Order") = 'trichoptera' THEN 'Trichoptera'
WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera'
WHEN lower("Order") = 'plecoptera' THEN 'Plecoptera'
WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta'
WHEN lower("Family") LIKE 'chiron%' THEN 'Chironomidae'
WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
ELSE 'Other'
END AS "Taxa",
"Order", "Class", "Family", SUM("Count") AS "Count"
FROM "tblBugCount"
GROUP BY "BugSheetID", "Order", "Class", "Family") c USING ("BugSheetID")
GROUP BY "Ranking", "Taxa"
ORDER BY "Ranking"

And modifying the original query gives the faster performance:

SELECT "Ranking",
"Taxa",
SUM("2222011") AS "2222011",
SUM("2222012") AS "2222012",
SUM("2222013") AS "2222013",
SUM("2222014") AS "2222014"
FROM (
SELECT DISTINCT
CASE
WHEN lower(c."Order") = 'coleoptera' THEN 5
WHEN lower(c."Order") = 'trichoptera' THEN 8
WHEN lower(c."Order") = 'ephemeroptera' THEN 6
WHEN lower(c."Order") = 'plecoptera' THEN 7
WHEN lower(c."Class") = 'oligochaeta' THEN 1
WHEN lower(c."Family") LIKE 'chiron%' THEN 2
WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3
ELSE 4
END AS "Ranking",
CASE
WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera'
WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'
WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera'
WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'
WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta'
WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'
WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
ELSE 'Other'
END AS "Taxa",
SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
FROM "tblBugIDSheetInfo" b
INNER JOIN "tblBugCount" c USING ("BugSheetID")
WHERE b."LocationCode" = '2222011'
OR b."LocationCode" = '2222012'
OR b."LocationCode" = '2222013'
OR b."LocationCode" = '2222014'
GROUP BY c."Order", c."Class", c."Family") d
GROUP BY "Ranking", "Taxa"
ORDER BY "Ranking"

Both these queries give me the result I was after:

> > I want to GROUP on the "Ranking" field as well so that all rows with the
> > same "Ranking" value are SUMmed. That is, I'm trying to achieve the
> > following:
> >
> > Ranking Taxa 2222011 2222012 2222013 2222014
> > ---------------------------------------------------------------
> > 1 "Oligochaeta" 494 1711 1759 1078
> > 2 "Chironomidae" 1385 2335 1500 1513
> > 3 "Diptera (Other)" 199 20 41 45
> > ...

Applying the grouping at the appropriate points helps :)

Regards.
Ross Johnson

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Paul M Foster 2006-04-03 03:43:34 References NULL field
Previous Message Ross Johnson 2006-04-02 08:00:08 Re: cross tables, SELECT expressions, and GROUP BY problem