Re: Statistical aggregate functions are not working with PARTIAL aggregation

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Statistical aggregate functions are not working with PARTIAL aggregation
Date: 2019-05-20 12:25:32
Message-ID: CAKJS1f_Cc6D+Jm7oo=dQ0kdWqMuzDpWFE3PzR8kPy2DSnc8YsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 20 May 2019 at 19:59, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> - numArguments = get_aggregate_argtypes(aggref, inputTypes);
> + numTransFnArgs = get_aggregate_argtypes(aggref, transFnInputTypes);
>
> If the function retrieves argument types of transform functions,
> it would be better that the function name is
> get_aggregate_transargtypes() and Aggref.aggargtypes has the name
> like aggtransargtypes.

Probably that would be a better name.

> /* Detect how many arguments to pass to the finalfn */
> if (aggform->aggfinalextra)
> - peragg->numFinalArgs = numArguments + 1;
> + peragg->numFinalArgs = numTransFnArgs + 1;
> else
> peragg->numFinalArgs = numDirectArgs + 1;
>
> I can understand the aggfinalextra case, but cannot understand
> another. As Andres said I think the code requires an explanation
> of why the final args is not numTransFnArgs but *numDirectArgs
> plus 1*.

numDirectArgs will be 0 for anything apart from order-set aggregate,
so in this case, numFinalArgs will become 1, since the final function
just accepts the aggregate state.
For ordered-set aggregates like, say percentile_cont the finalfn also
needs the argument that was passed into the aggregate. e.g
percentile_cont(0.5), the final function needs to know about 0.5. In
this case 0.5 is the direct arg and the indirect args are what are in
the order by clause, x in WITHIN GROUP (ORDER BY x). At least that's
my understanding.

> + /*
> + * When combining there's only one input, the to-be-combined
> + * added transition value from below (this node's transition
> + * value is counted separately).
> + */
> + pertrans->numTransInputs = 1;
>
> I believe this works but why the member has not been set
> correctly by the creator of the aggsplit?

Not quite sure what you mean here. We need to set this based on if
we're dealing with a combine function or a trans function.

> + /* Detect how many arguments to pass to the transfn */
>
> I want to have a comment there that explains why what ordered-set
> requires is not numTransFnArgs + (#sort cols?), but
> "list_length(aggref->args)", or a comment that explanas why they
> are compatible to be expplained.

Normal aggregate ORDER BY clauses are handle in nodeAgg.c, but
ordered-set aggregate's WITHIN GROUP (ORDER BY ..) args are handled in
the aggregate's transition function.

> > I ended up also renaming aggtransfn to transfn_oid in
> > build_pertrans_for_aggref(). Having it called aggtranfn seems a bit
> > too close to the pg_aggregate.aggtransfn column which is confusion
> > given that we might pass it the value of the aggcombinefn column.
>
> Is Form_pg_aggregate->aggtransfn different thing from
> transfn_oid? It seems very confusing to me apart from the naming.

I don't think this is explained very well in the code, so I understand
your confusion. Some of the renaming work I've been trying to do is
to try to make this more clear.

Basically when we're doing the "Finalize Aggregate" stage after having
performed a previous "Partial Aggregate", we must re-aggregate all the
aggregate states that were made in the Partial Aggregate stage. Some
of these states might need to be combined together if they both belong
to the same group. That's done with the function mentioned in
pg_aggregate.aggcombinefn. Whether we're doing a normal "Aggregate"
or a "Finalize Aggregate" the actual work to do is not all that
different, the only real difference is that we're aggregating
previously aggregated states rather than normal values. Since the rest
of the work the same, we run it through the same code in nodeAgg.c,
only we use the pg_aggregate.aggcombinefn for "Finalize Aggregate" and
use pg_aggregate.aggtransfn for nodes shown as "Aggregate" and
"Partial Aggregate" in explain.

That's sort of simplified as really a node shown as "Partial
Aggregate" in EXPLAIN is just not calling the aggfinalfn and "Finalize
Aggregate" is. The code in nodeAgg.c technically supports
re-combining previously aggregated states and not finalizing them.
You might wonder why you might do that. It was partially a by-product
of how the code was written, but also I had in mind about clustered
servers aggregating large datasets on remote servers running parallel
aggregates on each server then each server sending back the partially
aggregated states back to the main server to be re-combined and
finalized -- 3-stage aggregation. Changes were made after the initial
partial aggregate commit to partially remove the ability to form paths
in this shape this in the planner code, but that's mostly just removed
support in explain.c and changing bool flags in favour of the AggSplit
enum which lacks a combination of values to do that. We'd need an
AGGSPLIT_COMBINE_SKIPFINAL_DESERIAL_SERIAL... or something, to make
that work. (I'm surprised not to see more AggSplit values for
partition-wise aggregate since that shouldn't need to serialize and
deserialize the states...)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-05-20 13:16:32 Re: [HACKERS] Unlogged tables cleanup
Previous Message Matwey V. Kornilov 2019-05-20 11:32:39 [PATCH v2] Introduce spgist quadtree @<(point,circle) operator