From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Aggregate |
Date: | 2015-10-20 10:23:18 |
Message-ID: | CAKJS1f_9LvWFf4gnxg2HEK+mFL6pOCLf1S3gj6k+dLj579=MdQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 13 October 2015 at 20:57, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:
> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > On 13 October 2015 at 17:09, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> > wrote:
> >>
> >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> >> wrote:
> >> > Also, I think the path for parallel aggregation should probably be
> >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> >> > path here. I'm not clear whether that is what you are thinking or
> >> > not.
> >>
> >> No. I am thinking of the following way.
> >> Gather->partialagg->some partial path
> >>
> >> I want the Gather node to merge the results coming from all workers,
> >> otherwise
> >> it may be difficult to merge at parent of gather node. Because in case
> >> the partial
> >> group aggregate is under the Gather node, if any of two workers are
> >> returning
> >> same group key data, we need to compare them and combine it to make it a
> >> single group. If we are at Gather node, it is possible that we can
> >> wait till we get
> >> slots from all workers. Once all workers returns the slots we can
> compare
> >> and merge the necessary slots and return the result. Am I missing
> >> something?
> >
> >
> > My assumption is the same as Robert's here.
> > Unless I've misunderstood, it sounds like you're proposing to add logic
> into
> > the Gather node to handle final aggregation? That sounds like a
> modularity
> > violation of the whole node concept.
> >
> > The handling of the final aggregate stage is not all that different from
> the
> > initial aggregate stage. The primary difference is just that your calling
> > the combine function instead of the transition function, and the values
>
> Yes, you are correct, till now i am thinking of using transition types as
> the
> approach, because of that reason only I proposed it as Gather node to
> handle
> the finalize aggregation.
>
> > being aggregated are aggregates states rather than the type of the values
> > which were initially aggregated. The handling of GROUP BY is all the
> same,
> > yet you only apply the HAVING clause during final aggregation. This is
> why I
> > ended up implementing this in nodeAgg.c instead of inventing some new
> node
> > type that's mostly a copy and paste of nodeAgg.c [1]
>
> After going through your Partial Aggregation / GROUP BY before JOIN patch,
> Following is my understanding of parallel aggregate.
>
> Finalize [hash] aggregate
> -> Gather
> -> Partial [hash] aggregate
>
> The data that comes from the Gather node contains the group key and
> grouping results.
> Based on these we can generate another hash table in case of hash
> aggregate at
> finalize aggregate and return the final results. This approach works
> for both plain and
> hash aggregates.
>
> For group aggregate support of parallel aggregate, the plan should be
> as follows.
>
> Finalize Group aggregate
> ->sort
> -> Gather
> -> Partial group aggregate
> ->sort
>
> The data that comes from Gather node needs to be sorted again based on
> the grouping key,
> merge the data and generates the final grouping result.
>
> With this approach, we no need to change anything in Gather node. Is
> my understanding correct?
>
>
Our understandings are aligned.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2015-10-20 10:39:15 | Re: Foreign join pushdown vs EvalPlanQual |
Previous Message | Feike Steenbergen | 2015-10-20 09:24:47 | Re: SuperUser check in pg_stat_statements |