From: | "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | proposal - GROUPING SETS |
Date: | 2008-09-16 10:27:20 |
Message-ID: | 162867790809160327n23dd2f0eyd0f6ff6365afa25@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
== Proposal - GROUPING SETS ==
a grouping set feature allows multiple grouping clauses in one
query. Result of grouping sets is union of results each groupby
clause's result.
create table t(a int, b int);
insert into t values(10,20);
insert into t values(30,40);
select a, b from t group by grouping sets(a, b);
is same as:
select a, NULL from t group by a
union all
select NULL, b from t group by b;
Note: all ungrouped vars are transformed to NULL
Groupby clause should contains cube and rollup lists.
These are transformed to grouping sets via transformed rules:
create table t1(a int, b int, c int)
group by rollup(a, b, c)
->
group by grouping sets((a,b,c), (a,b), (a), ())
group by cube(a,b,c)
->
group by grouping sets ((a,b,c),(a,b),(a,c), (a), (b,c),
(b), (c), ())
Groupby clause or grouping sets should contains more sets.
Result is multiplication of these sets:
group by grouping sets(a), grouping sets(b, (b,c), ())
->
group by grouping sets((a,b), (a,b,c), (a))
When grouping sets are used, then we should to use grouping
and grouping_id functions. Function grouping returns 1 when
parameter is in current group set, else returns 0. Function
grouping_id returns value as
grouping_id(a,b,c) =
to_dec(to_bin(grouping(a) || grouping(b) || grouping(c)))
postgres=# select * from t;
a | b | c
----+----+----
10 | 20 | 30
10 | 20 | 30
(2 rows)
postgres=# select a,b,c from t group by grouping sets(a,b,c);
a | b | c
----+----+----
10 | |
| 20 |
| | 30
(3 rows)
postgres=# select a,b,c, grouping(a), grouping(b), grouping(c),
grouping_id(a,b,c) from t group by grouping sets(a,b,c);
a | b | c | grouping | grouping | grouping | grouping_id
----+----+----+----------+----------+----------+-------------
10 | | | 1 | 0 | 0 | 4
| 20 | | 0 | 1 | 0 | 2
| | 30 | 0 | 0 | 1 | 1
(3 rows)
some real sample:
create table report(
inserted date,
locality varchar,
name varchar,
c int);
postgres=# copy report to stdout;
2008-10-10 Prague Milk 10
2008-10-11 Prague Milk 12
2008-10-10 Prague Rum 2
2008-10-11 Prague Rum 6
2008-10-10 Berlin Milk 8
2008-10-11 Berlin Milk 14
2008-10-10 Berlin Beer 20
2008-10-11 Berlin Beer 25
postgres=# select * from report;
inserted | locality | name | c
------------+----------+------+----
2008-10-10 | Prague | Milk | 10
2008-10-11 | Prague | Milk | 12
2008-10-10 | Prague | Rum | 2
2008-10-11 | Prague | Rum | 6
2008-10-10 | Berlin | Milk | 8
2008-10-11 | Berlin | Milk | 14
2008-10-10 | Berlin | Beer | 20
2008-10-11 | Berlin | Beer | 25
(8 rows)
postgres=# select inserted, locality, name, sum(c)
from report
group by grouping sets(inserted, locality, name);
inserted | locality | name | sum
------------+----------+------+-----
2008-10-10 | | | 40
2008-10-11 | | | 57
| Berlin | | 67
| Prague | | 30
| | Milk | 44
| | Rum | 8
| | Beer | 45
(7 rows)
postgres=# select inserted, locality, name, sum(c)
from report
group by grouping sets(inserted, (locality, name));
inserted | locality | name | sum
------------+----------+------+-----
2008-10-10 | | | 40
2008-10-11 | | | 57
| Prague | Milk | 22
| Berlin | Milk | 22
| Berlin | Beer | 45
| Prague | Rum | 8
(6 rows)
postgres=# select inserted, locality, name, sum(c)
from report
group by name, grouping sets(inserted, locality);
inserted | locality | name | sum
------------+----------+------+-----
2008-10-11 | | Rum | 6
2008-10-10 | | Rum | 2
2008-10-10 | | Beer | 20
2008-10-11 | | Beer | 25
2008-10-10 | | Milk | 18
2008-10-11 | | Milk | 26
| Prague | Rum | 8
| Berlin | Beer | 45
| Berlin | Milk | 22
| Prague | Milk | 22
(10 rows)
== Implementation ==
Grouping sets introduce a new concept into SQL. One readed
tuple is multiple used. It's similar with WITH clause. It's little
bit dificult implement it for current PostgreSQL's executor.
In my prototype I used aux node Feeder. This node should to hold
only one tuple. I add new method for Agg node, that process only
one input tuple:
for (;;)
{
tuple = feeder->execute(grouping_sets->lefttree);
foreach(l, grouping_sets->subplans)
{
lfirst(l)->process(tuple);
}
if (is_null(tuple))
break;
}
It supports only HASH Agg nodes. For non hash agg should be used
other method (currently it isn't supported).
postgres=# explain verbose select inserted, locality, name, sum(c)
from report
group by name,grouping sets(inserted, locality);
QUERY PLAN
-----------------------------------------------------------------
Grouping Sets (cost=12.00..35.00 rows=400 width=68)
Output: NULL::date, locality, name, sum(c)
-> Seq Scan on report (cost=0.00..18.00 rows=800 width=68)
Output: inserted, locality, name, c
-> HashAggregate (cost=6.00..8.50 rows=200 width=68)
Output: inserted, NULL::character varying, name, sum(c)
-> Feeder (cost=0.00..0.00 rows=800 width=68)
Output: inserted, locality, name, c
-> HashAggregate (cost=6.00..8.50 rows=200 width=68)
Output: NULL::date, locality, name, sum(c)
-> Feeder (cost=0.00..0.00 rows=800 width=68)
Output: inserted, locality, name, c
(12 rows)
After work on prototype I don't see any problems in executor or
parser. I expect some dificulties in planner. With grouping sets
grouping_planner procedure will be much more complex:
1. targetlist and groupclause will be list of list,
2. we should to repeat estimation of NumGroups for each group
set (it's should be shared with CTE feature??).
== Parser problems ==
1. The identifier cube is used in contrib cube.
Solution: CUBE '(' ... ')' generates funcCall and it is transformed
to grouping sets only in groupby clause later.
I invite any ideas, notes and help with documentation.
Regards
Pavel Stehule
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-09-16 11:53:43 | Subtransaction commits and Hot Standby |
Previous Message | Marko Kreen | 2008-09-16 10:26:17 | Re: [Review] fix dblink security hole |