From: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> |
Subject: | Re: Final Patch for GROUPING SETS |
Date: | 2014-12-22 23:57:43 |
Message-ID: | 87wq5jgv1c.fsf@news-spur.riddles.org.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>>>> "Robert" == Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I would be interested in seeing more good examples of the size and
>> type of grouping sets used in typical queries.
Robert> From what I have seen, there is interest in being able to do
Robert> things like GROUP BY CUBE(a, b, c, d) and have that be
Robert> efficient.
Yes, but that's not telling me anything I didn't already know.
What I'm curious about is things like:
- what's the largest cube(...) people actually make use of in practice
- do people make much use of the ability to mix cube and rollup, or
take the cross product of multiple grouping sets
- what's the most complex GROUPING SETS clause anyone has seen in
common use
Robert> That will require 16 different groupings, and we really want
Robert> to minimize the number of times we have to re-sort to get all
Robert> of those done. For example, if we start by sorting on (a, b,
Robert> c, d), we want to then make a single pass over the data
Robert> computing the aggregates with (a, b, c, d), (a, b, c), (a,
Robert> b), (a), and () as the grouping columns.
In the case of cube(a,b,c,d), our code currently gives:
b,d,a,c: (b,d,a,c),(b,d)
a,b,d: (a,b,d),(a,b)
d,a,c: (d,a,c),(d,a),(d)
c,d: (c,d),(c)
b,c,d: (b,c,d),(b,c),(b)
a,c,b: (a,c,b),(a,c),(a),()
There is no solution in less than 6 sorts. (There are many possible
solutions in 6 sorts, but we don't attempt to prefer one over
another. The minimum number of sorts for a cube of N dimensions is
obviously N! / (r! * (N-r)!) where r = floor(N/2).)
If you want the theory: the set of grouping sets is a poset ordered by
set inclusion; what we want is a minimal partition of this poset into
chains (since any chain can be processed in one pass), which happens
to be equivalent to the problem of maximum cardinality matching in a
bipartite graph, which we solve in polynomial time with the
Hopcroft-Karp algorithm. This guarantees us a minimal solution for
any combination of grouping sets however specified, not just for
cubes.
--
Andrew (irc:RhodiumToad)
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-12-23 00:34:27 | Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
Previous Message | Alvaro Herrera | 2014-12-22 23:55:17 | Re: pgbench -f and vacuum |