Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)

From: Олег Царев <zabivator(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date: 2009-05-12 22:13:38
Message-ID: 54f48e4f0905121513n840a44dredb8ccf56317ab7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all!
If no one objecte (all agree, in other say) i continue work on patch -
particulary, i want support second strategy (tuple store instead of
hash-table) for save order of source (more cheap solution in case with
grouping sets + order by), investigate and brainstorm another
optimisation, writing regression tests and technical documentation.
But I need some time for complete my investigation internals of
PostgreSQL, particulary CTE.

Thanks.

2009/5/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> Hello Oleg
>
> I am sending a new CTE based variant of my GROUPING SETS patch,
>
> this patch has some bugs but it is good prototype (it's more stable
> than old patch):
>
> postgres=# select selling_date, baguette, sum(items) from
> baguette_selling group by grouping sets(1,2);
>  selling_date | baguette | sum
> --------------+----------+-----
>  2007-10-30   |          |  17
>  2007-10-31   |          |  12
>              | golf     |   9
>              | buster   |  20
> (4 rows)
>
> postgres=# select selling_date, baguette, sum(items),
> grouping(selling_date), grouping(baguette), grouping_id(selling_date,
> baguette) from baguette_selling group by grouping sets(1,2);
>  selling_date | baguette | sum | grouping | grouping | grouping_id
> --------------+----------+-----+----------+----------+-------------
>  2007-10-30   |          |  17 |        1 |        0 |           2
>  2007-10-31   |          |  12 |        1 |        0 |           2
>              | golf     |   9 |        0 |        1 |           1
>              | buster   |  20 |        0 |        1 |           1
> (4 rows)
>
> postgres=# select selling_date, baguette, sum(items),
> grouping(selling_date), grouping(baguette), grouping_id(selling_date,
> baguette) from baguette_selling group by grouping sets(1,2,());
>  selling_date | baguette | sum | grouping | grouping | grouping_id
> --------------+----------+-----+----------+----------+-------------
>  2007-10-30   |          |  17 |        1 |        0 |           2
>  2007-10-31   |          |  12 |        1 |        0 |           2
>              | golf     |   9 |        0 |        1 |           1
>              | buster   |  20 |        0 |        1 |           1
>              |          |  29 |        0 |        0 |           0
> (5 rows)
>
> I thing so parser part is well and correct (and ported to 8.4).
>
> CTE works well, but not 100% effective, and will be better to use
> direct tuplestore interface (as second technique - when hash tables
> can't to be used).
>
> I am thinking, so the best solution is enhancing current Aggregate
> node for support of GroupingSets. The code base on UNION ALL is +/-
> equal to CTE, and I don't thing, so this should be optimal. But work
> freely, please. I have not free time for this patch next two months.
> So if you have time, it's your.
>
> regards
> Pavel Stehule
>
>
>
> 2009/5/10 Олег Царев <zabivator(at)gmail(dot)com>:
>> Hello all.
>> Please, approve my ideas for implementation.
>>
>> Standart has feature T431: Extended grouping capabilities.
>> This feature i found in TODO-list:
>> http://wiki.postgresql.org/wiki/Todo -> SQL Commands -> TO DO
>>
>> MS SQL 2005 partial support this feature:
>> http://www.kodyaz.com/articles/sql-server-2005-cube-rollup-cannot-compute-distinct-aggregates.aspx
>> http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx
>>
>> MS SQL 2008 support this feature:
>> http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx
>>
>> Oracle support this feature:
>> http://www.compshack.com/sql/oracle-group-rollup
>>
>> So, it's short notes about GROUPING SETS, but more complete
>> information have in a official documentation of MS SQL and Oracle
>> (copyright limited for send as attach).
>>
>> First. GROUPG SETS.
>>
>> select A,B,C,SUM(D) from table group by GROUPING SETS( (A,B,C), (A),
>> () ) - it's example of use grouping sets.
>> Semantic of this construction - make group by over source more, than
>> one group of column.
>> It's very wide key - A,B C. In result set of this example we can find
>> result set of select   select A,B,C,SUM(D) from table group by A,B,C -
>> as subset. It's mind: "GROUP BY A,B,C" - subset of "GROUP BY GROUPING
>> SETS( (A,B,C), (A), () )
>> Two subset - is GROUP BY A B, and instead C column we look NULL.
>> Third subset - GROUP BY (), instead A,B,C - NULL, one row - it's name
>> "GRAND TOTAL". - calculate over all subset without grouping
>>
>> Also have function "GROUPING"  it's function say about null - "real
>> null" (from table) or generated by "GROUP BY GROUPING SETS"
>>
>> My point: this feature can implement over GROUP BY and UNION ALL
>> We can make decomposition of "GROUP BY GROUPING SETS( (A,B,C),(A),()
>> )" to  select A,B,C fron table GROUP BY A,B,C .UNION  ALL select
>> A,B,NULL from table group BY A,B UNION ALL NUll,NUll,NULL from table
>> group by();
>>
>> So, it's very simple, don't require modification of executor and
>> callibrate cost - only parser and semantic anylysis,
>> '
>> So, ROLLUP(A1,...An) is alias to "GROUP BY GROUPING SETS(
>> (A1,...,An),(A1,...,An-1),... (An-1,An),(An),() ),
>> CUBE - analogue.
>>
>> If this idea it's good -  i can write code base on old patch
>> http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php or
>> from clean list (as you wish).
>>
>> In future i know how to implement ROLLUP more optimal (executor
>> iterator) and use this ROLLUP for optimisation another GROUP BY,
>> GROUPING SETS.
>>
>> Thanks.
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-05-13 02:01:53 Re: idea: global temp tables
Previous Message Tom Lane 2009-05-12 21:55:13 Fixing the libxml memory allocation situation