Re: Rethinking representation of partial-aggregate steps

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rethinking representation of partial-aggregate steps
Date: 2016-06-23 17:12:18
Message-ID: CA+TgmoYNZiCbbrmy15VWGMGH8qFhwixTM78WeuEWm8gP2x89rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 23, 2016 at 12:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I really dislike this aspect of the partial-aggregate patch:
>
> typedef struct Agg
> {
> ...
> bool combineStates; /* input tuples contain transition states */
> bool finalizeAggs; /* should we call the finalfn on agg states? */
> bool serialStates; /* should agg states be (de)serialized? */
> ...
> } Agg;
>
> Representing the various partial-aggregate behaviors as three bools seems
> like a bad idea to me. In the first place, this makes it look like there
> are as many as eight possible behaviors, whereas only a subset of those
> flag combinations are or ever will be supported (not that the comments
> anywhere tell you that, much less which ones). In the second place,
> it leads to unreadable code like this:
>
> /* partial phase */
> count_agg_clauses(root, (Node *) partial_grouping_target->exprs,
> &agg_partial_costs, false, false, true);
>
> /* final phase */
> count_agg_clauses(root, (Node *) target->exprs, &agg_final_costs,
> true, true, true);
> count_agg_clauses(root, parse->havingQual, &agg_final_costs, true,
> true, true);
>
> Good luck telling whether those falses and trues actually do what they
> should.
>
> I'm inclined to think that we should aggressively simplify here, perhaps
> replacing all three of these flags with a three-way enum, say
>
> PARTIALAGG_NORMAL /* simple aggregation */
> PARTIALAGG_PARTIAL /* lower phase of partial aggregation */
> PARTIALAGG_FINAL /* upper phase of partial aggregation */
>
> This embodies a couple of judgments that maybe someone would quibble with:
> * we don't have any use for intermediate partial aggregation steps;
> * we don't have any use for partial aggregation in which data transferred
> up needn't be serialized.
>
> But I can't really see how either of those would make sense for any
> practical use-case, and I don't think we need to have a confusing and
> bug-prone API in order to hold the door open to support them.

Hmm, well I guess I would have to disagree with the idea that those
other modes aren't useful. I seem to recall that David had some
performance results showing that pushing partial aggregate steps below
an Append node resulted in a noticeable speed-up even in the absence
of any parallelism, presumably because it avoids whatever projection
the Append might do, and also improves icache and dcache locality. So
it seems quite possible that you might want to have something like
this:

FinalizeAggregate
-> Gather
-> IntermediateAggregate
-> Append
-> PartialAggregate
-> Parallel SeqScan
-> PartialAggregate
-> Parallel SeqScan
-> PartialAggregate
-> Parallel SeqScan
-> PartialAggregate
-> Parallel SeqScan

The partial aggregate plans need not serialize, because they are
passing data to the same process, and the intermediate aggregate is
there, too.

I do agree, however, that the three Boolean flags don't make the code
entirely easy to read. What I might suggest is that we replace the
three Boolean flags with integer flags, something like this:

#define AGGOP_COMBINESTATES 0x1
#define AGGOP_SERIALIZESTATES 0x2
#define AGGOP_FINALIZEAGGS 0x4

#define AGGTYPE_SIMPLE (AGGOP_FINALIZEAGGS)
#define AGGTYPE_PARTIAL_SERIAL (AGGOP_SERIALIZESTATES)
#define AGGTYPE_FINALIZE_SERIAL
(AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS|AGG_SERIALIZESTATES)

Code that requests behavior can do so with the AGGTYPE_* constants,
but code that implements behavior can test AGGOP_* constants. Then if
we decide we need new combinations in the future, we can just add
those --- e.g. AGGTYPE_PARTIAL = 0, AGGTYPE_FINALIZE =
AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS, AGGTYPE_INTERMEDIATE =
AGGOP_COMBINESTATES --- and have some hope that it will mostly work.

I actually think nearly all of the combinations are sensible, here,
although I think it may have been a mistake to overload the "serialize
states" flag to mean both "does this combining aggregate expect to
receive the serial type rather than the transition type?" and also
"does this partial aggregate output the serial type rather than the
transition type?". In the example above, you'd want the
IntermediateAggregate to expect the transition type but output the
serial type. Oops.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-06-23 17:12:28 Re: Bug in to_timestamp().
Previous Message Alex Ignatov 2016-06-23 16:50:53 Re: Bug in to_timestamp().