From: | Олег Царев <zabivator(at)gmail(dot)com> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) |
Date: | 2009-05-10 13:40:53 |
Message-ID: | 54f48e4f0905100640l3ece5a3x6b6c3e7e3a91d260@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Pavel.
I read you letter, and found some points:
a) Simple transformation any GROUPING SETS in GROUP BY + UNION ALL
require clone source for every group.
It's so slow.
My point: we can make for start simple implementation.
b) Your sollution it's so good, IMHO
WITH q AS (SELECT * FROM some)
SELECT * FROM q GROUP BY a
UNION ALL
SELECT * FROM q GROUP BY b
But can you describe me how it's used on planner (calculate cost) and exector.
Sharing source - it's mind - we don't process tree, we process
DIRECTED GRAPH. Hard, many bugs, so on, not?
PostgreSQL support sharing nodes of executor? How? Where i can read?
Next. Ok, we spool source. After we use simple hash group/sort group +
union all? Or more advanced techniques? If different group by - it's
not different from my idea, it's logic additional (spool source for
fast rewind)
c) Don't forget also about node requiments - if we use hash table as
container for source - we reduce some sorting properties - and query
select A,B from TABLE group by ((A,B),(B)) order by a,b require
sorting on output of grouping sets.
It's mind - we need insert sort.
d) We can make SORT ROLLUP and SORT REDUCE ROLLUP.
first - sort group by use sorting for grouping data and calculate aggregations.
So, 'order by A,B,C' also 'order by A,B' also order by 'A' also order by ''
What it's mind? It's mind we can use sort features for implement SORT
ROLLUP. For every sub group we calculate self aggregates... oh, i bad
say. Look on "select a,sum(b) from table group by a". Sort group
grouping by a, and for every a-group calculate aggregations (sum), for
ROLLUP A,B we can make three aggregations ; for A,B than A than () -
and use one path on source calculate aggregations on every step, and
for split source on different subsets.
It's more perfomance, than different group by and union all.
Look for MS SQL:
http://www.kodyaz.com/articles/sql-server-2005-cube-rollup-cannot-compute-distinct-aggregates.aspx
Why MS SQL don't support distinct aggregations? I think - because
every distinct aggregations in MS SQL require hash, and many
aggregations - it's so slow.
Thank you!
2009/5/10 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> Hello
>
> some other info is on http://wiki.postgresql.org/wiki/Grouping_Sets
>
> Maybe in e few weak I'll have some a prototype based on CTE. My older
> technique based on hashed tables should be well, but it carry lot of
> unshared code. Using CTE means so we can optimize CTE and GROUPING
> SETS together. I plan to transform query
>
> SELECT * FROM some GROUP BY GROUPING SETS(a, b)
>
> to
>
> WITH q AS (SELECT * FROM some)
> SELECT * FROM q GROUP BY a
> UNION ALL
> SELECT * FROM q GROUP BY b
>
>
> 2009/5/10 Олег Царев <zabivator(at)gmail(dot)com>:
>> Hello all.
>> Please, approve my ideas for implementation.
>>
>> Standart has feature T431: Extended grouping capabilities.
>> This feature i found in TODO-list:
>> http://wiki.postgresql.org/wiki/Todo -> SQL Commands -> TO DO
>>
>> MS SQL 2005 partial support this feature:
>> http://www.kodyaz.com/articles/sql-server-2005-cube-rollup-cannot-compute-distinct-aggregates.aspx
>> http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx
>>
>> MS SQL 2008 support this feature:
>> http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx
>>
>> Oracle support this feature:
>> http://www.compshack.com/sql/oracle-group-rollup
>>
>> So, it's short notes about GROUPING SETS, but more complete
>> information have in a official documentation of MS SQL and Oracle
>> (copyright limited for send as attach).
>>
>> First. GROUPG SETS.
>>
>> select A,B,C,SUM(D) from table group by GROUPING SETS( (A,B,C), (A),
>> () ) - it's example of use grouping sets.
>> Semantic of this construction - make group by over source more, than
>> one group of column.
>> It's very wide key - A,B C. In result set of this example we can find
>> result set of select select A,B,C,SUM(D) from table group by A,B,C -
>> as subset. It's mind: "GROUP BY A,B,C" - subset of "GROUP BY GROUPING
>> SETS( (A,B,C), (A), () )
>> Two subset - is GROUP BY A B, and instead C column we look NULL.
>> Third subset - GROUP BY (), instead A,B,C - NULL, one row - it's name
>> "GRAND TOTAL". - calculate over all subset without grouping
>>
>> Also have function "GROUPING" it's function say about null - "real
>> null" (from table) or generated by "GROUP BY GROUPING SETS"
>>
>> My point: this feature can implement over GROUP BY and UNION ALL
>> We can make decomposition of "GROUP BY GROUPING SETS( (A,B,C),(A),()
>> )" to select A,B,C fron table GROUP BY A,B,C .UNION ALL select
>> A,B,NULL from table group BY A,B UNION ALL NUll,NUll,NULL from table
>> group by();
>>
>> So, it's very simple, don't require modification of executor and
>> callibrate cost - only parser and semantic anylysis,
>> '
>> So, ROLLUP(A1,...An) is alias to "GROUP BY GROUPING SETS(
>> (A1,...,An),(A1,...,An-1),... (An-1,An),(An),() ),
>> CUBE - analogue.
>>
>> If this idea it's good - i can write code base on old patch
>> http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php or
>> from clean list (as you wish).
>
> This is suboptimal. When SELECT * FROM X GROUP BY GROUPING SETS ...,
> where X is some joined data, then you repeat JOIN on every grouping
> set. So your solution is simple for implementation, but it should be
> really slow.
>
> Regards
> Pavel Stehule
>
>>
>> In future i know how to implement ROLLUP more optimal (executor
>> iterator) and use this ROLLUP for optimisation another GROUP BY,
>> GROUPING SETS.
>>
>> Thanks.
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2009-05-10 14:05:06 | Re: SQL state in log_line_prefix |
Previous Message | Guillaume Smet | 2009-05-10 13:34:29 | Re: SQL state in log_line_prefix |