Quick Links

Re: WIP Patch for GROUPING SETS phase 1

From:	Svenne Krap <svenne(at)krap(dot)dk>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP Patch for GROUPING SETS phase 1
Date:	2015-04-20 08:36:58
Message-ID:	20150420083658.2543.70336.pgcf@coridan.postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Hi,

I have (finally) found time to review this.

The syntax is as per spec as I can see, and the queries I have tested have all produced the correct output.

The documentation looks good and is clear.

I think it is spec compliant, but I am not used enough to the spec to be sure. Also I have not understood the function of <set quantifier> (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I haven't marked the spec compliant part.

The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a sorting problem when looking at the diff) which should be unrelated to GSP. I don't know enough of the check to know if it has already run the GSP tests..

I have also been running a few tests on some real data. This is run on my laptop with 32 GB of memory and a fast SSD.

The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a tiny multi-column lookup table. I am returning a count(*).
Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and ROLLUP both works fine and if work_buffers are large enough it slightly beats the handwritten "union all" equivalent (runtimes as 7,6 seconds to 7,7 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) seconds).

The other query is on the same datatable as before, but with three "columns" (two calculated and one natural) for a cube. I am returning a count(*).
First column is "extract year from date column"
Second column is "divide a value by something and truncate" (i.e. make buckets)
Third column is a litteral integer column.
Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). Nothing obvious about why in the explain (analyze,buffers,costs,timing) .

I have the explains, but as the dataset is semi-private and I don't have any easy way to edit out names in it, I will send it on request (non-disclosure from the recipient is of course a must) and not post it on the list.

I think the feature is ready to be commited, but am unsure whether I am qualified to gauge that :)

/Svenne

The new status of this patch is: Ready for Committer

In response to

Re: WIP Patch for GROUPING SETS phase 1 at 2015-03-27 08:03:59 from Svenne Krap

Responses

Re: WIP Patch for GROUPING SETS phase 1 at 2015-04-20 08:40:28 from Svenne Krap
Re: WIP Patch for GROUPING SETS phase 1 at 2015-04-21 15:06:58 from Andrew Gierth

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sawada Masahiko	2015-04-20 08:40:02	Re: Auditing extension for PostgreSQL (Take 2)
Previous Message	Andres Freund	2015-04-20 08:28:02	Re: Replication identifiers, take 4