Re: Creating a query to structure results grouped by two columns, pivoting only a third colum

From: Iuri Sampaio <iuri(dot)sampaio(at)gmail(dot)com>
To: Steve Midgley <science(at)misuse(dot)org>
Cc: pgsql-sql <pgsql-sql(at)lists(dot)postgresql(dot)org>
Subject: Re: Creating a query to structure results grouped by two columns, pivoting only a third colum
Date: 2021-09-27 14:06:47
Message-ID: 66CD7D72-62D0-4ED1-BA98-3C8B9ED738C1@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Using PySpark API, the solution would be

````
df = df_table1.groupBy("area", "canal").pivot("date").sum("peso")
````

In oracle 11g, it would be sort of

````
SELECT * FROM table1 PIVOT (sum(peso) FOR date IN (
'2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01’
) P
````

Oracle has this strange way of dealing with PIVOT, without even need to group columns.

If I were to write the query, ut would be somehtinglike this (i.e. similar to Oracle but more explicit).

SELECT * FROM (
SELECT
date,
area,
sub_canal,
SUM(peso) AS peso
FROM table1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
) t1
PIVOT (
peso FOR dat_referencia_mes IN ('2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01' )
) p

another thing that I would have done is to replace the static set of dates to a dynamic one. (i.e. SELECT DISTINCT date FROM table1)

However, I’m dealing with syntax errors now, plus once I fix them I’m not sure if that will work ( I mean

> On Sep 25, 2021, at 21:53, Steve Midgley <science(at)misuse(dot)org> wrote:
>
>
>
> On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri(dot)sampaio(at)gmail(dot)com <mailto:iuri(dot)sampaio(at)gmail(dot)com>> wrote:
> [FROM]
> ###########################
> date | area | canal | peso
> 2021-03-01 area1 can1 45.6768
> 2021-03-01 area1 can2 54.6768
> 2021-03-01 area1 can3 87.6
> 2021-03-01 area2 can1 1.87
> 2021-03-01 area2 can2 12.7687
> 2021-03-01 area2 can3 965.568
> 2021-03-01 area3 can1 968.95
> 2021-03-01 area3 can2 1.6867
> 2021-03-01 area3 can3 8.897
> …. … … ...
> 2021-06-01 area1 can1
> 2021-06-01 area1 can2
> 2021-06-01 area1 can3
> 2021-06-01 area2 can1
> … … ...
>
> 2021-12-01 area1
> 2021-12-01 area1
> 2021-12-01 area1
> ###########################
>
>
> [TO]
>
> ###########################
> area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...
> peso peso peso peso
>
> area1 can1 45.6768 875.98 1.232 …
> area1 can2 54.6768 665.8 2.43 ...
> area1 can3 87.6 65.8 4.76 ...
> area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...
>
> area2 can1 1.87 … ...
> area2 can2 12.7687
> area2 can3 965.568
> area2 total SUM(1.87+12.7687+965.568) … ...
>
>
> area3 can1 968.95
> area3 can2 1.6867
> area3 can3 8.897
>
> ...
> ###########################
>
>
> I think you'd want to use the "crosstab" function in the module "tablefunc" https://www.postgresql.org/docs/current/tablefunc.html <https://www.postgresql.org/docs/current/tablefunc.html>
>
> This post goes into quite a bit of detail on how to implement this type of query: https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905 <https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905>
>
> Does that get you close enough?
> Steve
>
> On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri(dot)sampaio(at)gmail(dot)com <mailto:iuri(dot)sampaio(at)gmail(dot)com>> wrote:
> Hello there,
> In order to achieve such structure "pivoting" table and "grouping by" multiple columns. (i.e. as illustrated below), what would be the SQL implementation?
> The source query is:
>
> ````
> SELECT
> t1.date,
> t1.area,
> t1.canal,
> SUM(t1.peso) AS peso
> FROM table1 t1
> GROUP BY 1, 2, 3
> ORDER BY 1, 2, 3
>
> ````
>
> and source query generates a initial structure as in:
>
> ###########################
> date | area | canal | peso
> 2021-03-01 area1 can1 45.6768
> 2021-03-01 area1 can2 54.6768
> 2021-03-01 area1 can3 87.6
> 2021-03-01 area2 can1 1.87
> 2021-03-01 area2 can2 12.7687
> 2021-03-01 area2 can3 965.568
> 2021-03-01 area3 can1 968.95
> 2021-03-01 area3 can2 1.6867
> 2021-03-01 area3 can3 8.897
> …. … … ...
> 2021-06-01 area1 can1
> 2021-06-01 area1 can2
> 2021-06-01 area1 can3
> 2021-06-01 area2 can1
> … … ...
>
> 2021-12-01 area1
> 2021-12-01 area1
> 2021-12-01 area1
> ###########################
>
>
>
>
> Then, the goal's to achieve a final structure grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso".
> Plus, a partial total of each area, named as "total” .
>
> I tried to write a query to support such a display, however, I got stuck at pivoting date only to the column peso
>
> ````
> SELECT
> t2.area,
> t2.canal,
> ( SELECT
> month,
> peso_valor
> FROM (
> SELECT
> month(t1.date) month,
> t1.area,
> t1.canal,
> SUM(t1.peso_valor) AS peso_valor
> FROM tbl_data t1
> GROUP BY 1, 2, 3
> ORDER BY 1, 2, 3
> ) source_table
> ) PIVOT (
> peso
> FOR month IN (
> 1 Jan, 2 Feb, 3 Mar, 4 Apr, 5 Mai, 6 Jun, 7 Jul, 8 Aug, 9 Sep, 10 Oct, 11 Nov, 12 Dec
> )
> ) as pivot_table
> ORDER BY month
> ) as t2.peso
> FROM tbl_data t2
> GROUP BY 1, 2
> ````
>
>
> what would be the SQL implementation to achieve a structure as such ?
> (.i.e. grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso”.)
>
> illustrated bellow?
>
>
> Best wishes,
> I
>
>
>
>
>
>
>
>
> ###########################
> area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...
> peso peso peso peso
>
> area1 can1 45.6768 875.98 1.232 …
> area1 can2 54.6768 665.8 2.43 ...
> area1 can3 87.6 65.8 4.76 ...
> area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...
>
> area2 can1 1.87 … ...
> area2 can2 12.7687
> area2 can3 965.568
> area2 total SUM(1.87+12.7687+965.568) … ...
>
>
> area3 can1 968.95
> area3 can2 1.6867
> area3 can3 8.897
>
> ...
> ###########################
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Shaozhong SHI 2021-10-03 07:30:40 PostgreSQL CHECK Constraint
Previous Message Steve Midgley 2021-09-26 00:53:39 Re: Creating a query to structure results grouped by two columns, pivoting only a third colum