Quick Links

Re: Parallel Aggregate

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Aggregate
Date:	2016-03-18 17:15:03
Message-ID:	CA+TgmobvoUo6rnS0p80jCE1R6UpN19ZLO0rqNz5U+b0gzny=Fw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> I've attached an updated patch.

This looks substantially better than earlier versions, although I
haven't studied every detail of it yet.

+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial

I understand why partial aggregation doesn't work if you have an ORDER
BY clause attached to the aggregate itself, but it's not so obvious to
me that using DISTINCT should rule it out. I guess we can do it that
way for now, but it seems aggregate-specific - e.g. AVG() can't cope
with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and
MAX() are the outliers in this regard, but they are a pretty common
case.

+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate

I think you should adjust the punctuation to "with a single executor
Agg node; however, there are". And maybe drop the word "executor".

And on the next line, I'd add a comma: "such as parallel aggregation,
when we want".

astate->xprstate.evalfunc =
(ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate =
(AggState *) parent;
-
- aggstate->aggs = lcons(astate,
aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in
non-Agg plan node");
}
- else
+ if (aggref->aggpartial ==
aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in
non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial
type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR,
"Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;

This seems like it involves more code rearrangement than is really
necessary here.

+ * Partial paths in the input rel could allow us to perform
aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.

Change comma to semicolon or period.

/*
* Generate a HashAgg Path atop of the cheapest partial path, once
* again, we'll only do this if it looks as though the hash table won't
* exceed work_mem.
*/

Same here. Commas are not the way to connect two independent sentences.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Parallel Aggregate at 2016-03-18 13:16:51 from David Rowley

Responses

Re: Parallel Aggregate at 2016-03-19 00:52:10 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2016-03-18 17:18:48	Re: raw output from copy
Previous Message	Stas Kelvich	2016-03-18 16:50:54	Re: Speedup twophase transactions