Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2015-10-20 10:23:18
Message-ID: CAKJS1f_9LvWFf4gnxg2HEK+mFL6pOCLf1S3gj6k+dLj579=MdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 October 2015 at 20:57, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > On 13 October 2015 at 17:09, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> > wrote:
> >>
> >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> >> wrote:
> >> > Also, I think the path for parallel aggregation should probably be
> >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> >> > path here. I'm not clear whether that is what you are thinking or
> >> > not.
> >>
> >> No. I am thinking of the following way.
> >> Gather->partialagg->some partial path
> >>
> >> I want the Gather node to merge the results coming from all workers,
> >> otherwise
> >> it may be difficult to merge at parent of gather node. Because in case
> >> the partial
> >> group aggregate is under the Gather node, if any of two workers are
> >> returning
> >> same group key data, we need to compare them and combine it to make it a
> >> single group. If we are at Gather node, it is possible that we can
> >> wait till we get
> >> slots from all workers. Once all workers returns the slots we can
> compare
> >> and merge the necessary slots and return the result. Am I missing
> >> something?
> >
> >
> > My assumption is the same as Robert's here.
> > Unless I've misunderstood, it sounds like you're proposing to add logic
> into
> > the Gather node to handle final aggregation? That sounds like a
> modularity
> > violation of the whole node concept.
> >
> > The handling of the final aggregate stage is not all that different from
> the
> > initial aggregate stage. The primary difference is just that your calling
> > the combine function instead of the transition function, and the values
>
> Yes, you are correct, till now i am thinking of using transition types as
> the
> approach, because of that reason only I proposed it as Gather node to
> handle
> the finalize aggregation.
>
> > being aggregated are aggregates states rather than the type of the values
> > which were initially aggregated. The handling of GROUP BY is all the
> same,
> > yet you only apply the HAVING clause during final aggregation. This is
> why I
> > ended up implementing this in nodeAgg.c instead of inventing some new
> node
> > type that's mostly a copy and paste of nodeAgg.c [1]
>
> After going through your Partial Aggregation / GROUP BY before JOIN patch,
> Following is my understanding of parallel aggregate.
>
> Finalize [hash] aggregate
> -> Gather
> -> Partial [hash] aggregate
>
> The data that comes from the Gather node contains the group key and
> grouping results.
> Based on these we can generate another hash table in case of hash
> aggregate at
> finalize aggregate and return the final results. This approach works
> for both plain and
> hash aggregates.
>
> For group aggregate support of parallel aggregate, the plan should be
> as follows.
>
> Finalize Group aggregate
> ->sort
> -> Gather
> -> Partial group aggregate
> ->sort
>
> The data that comes from Gather node needs to be sorted again based on
> the grouping key,
> merge the data and generates the final grouping result.
>
> With this approach, we no need to change anything in Gather node. Is
> my understanding correct?
>
>
Our understandings are aligned.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-10-20 10:39:15 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Feike Steenbergen 2015-10-20 09:24:47 Re: SuperUser check in pg_stat_statements