Re: Parallel Aggregate

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-15 20:23:26
Message-ID: CA+TgmoZf_fKVhn7UD-gHcsdj7pEK1td4qQ3uexs=M-L=y72-Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> More generally, why are we inventing PartialAggref
>> instead of reusing Aggref? The code comments seem to contain no
>> indication as to why we shouldn't need all the same details for
>> PartialAggref that we do for Aggref, instead of only a small subset of
>> them. Hmm... actually, it looks like PartialAggref is intended to
>> wrap Aggref, but that seems like a weird design. Why not make Aggref
>> itself DTRT? There's not enough explanation in the patch of what is
>> going on here and why.
>
> A comment does explain this, but perhaps it's not good enough, so I've
> rewritten it to become.
>
> /*
> * PartialAggref
> *
> * When partial aggregation is required in a plan, the nodes from the partial
> * aggregate node, up until the finalize aggregate node must pass the partially
> * aggregated states up the plan tree. In regards to target list construction
> * in setrefs.c, this requires that exprType() returns the state's type rather
> * than the final aggregate value's type, and since exprType() for Aggref is
> * coded to return the aggtype, this is not correct for us. We can't fix this
> * by going around modifying the Aggref to change it's return type as setrefs.c
> * requires searching for that Aggref using equals() which compares all fields
> * in Aggref, and changing the aggtype would cause such a comparison to fail.
> * To get around this problem we wrap the Aggref up in a PartialAggref, this
> * allows exprType() to return the correct type and we can handle a
> * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
> * the underlying Aggref. The PartialAggref lives as long as executor start-up,
> * where it's removed and replaced with it's underlying Aggref.
> */
> typedef struct PartialAggref
>
> does that help explain it better?

I still think that's solving the problem the wrong way. Why can't
exprType(), when applied to the Aggref, do something like this?

{
Aggref *aref = (Aggref *) expr;
if (aref->aggpartial)
return aref->aggtranstype;
else
return aref->aggtype;
}

The obvious answer is "well, because those fields don't exist in
Aggref". But shouldn't they? From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.

>> I don't see where this applies has_parallel_hazard or anything
>> comparable to the aggregate functions. I think it needs to do that.
>
> Not sure what you mean here.

If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table. It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.

>> + /* XXX +1 ? do we expect the main process to actually do real work? */
>> + numPartialGroups = Min(numGroups, subpath->rows) *
>> + (subpath->parallel_degree + 1);
>> I'd leave out the + 1, but I don't think it matters much.
>
> Actually I meant to ask you about this. I see that subpath->rows is
> divided by the Path's parallel_degree, but it seems the main process
> does some work too, so this is why I added + 1, as during my tests
> using a query which produces 10 groups, and had 4 workers, I noticed
> that Gather was getting 50 groups back, rather than 40, I assumed this
> is because the main process is helping too, but from my reading of the
> parallel query threads I believe this is because the Gather, instead
> of sitting around idle tries to do a bit of work too, if it appears
> that nothing else is happening quickly enough. I should probably go
> read nodeGather.c to learn that though.

Yes, the main process does do some work, but less and less as the
query gets more complicated. See comments in cost_seqscan().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-15 20:28:45 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Aleksander Alekseev 2016-03-15 20:19:09 Re: Small patch: fix warnings during compilation on FreeBSD