Re: Combining Aggregates

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-03-15 17:39:54
Message-ID: 46e012ac-424a-c90f-8d0b-86097a79fef1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/14/2016 05:45 AM, David Rowley wrote:
> On 14 March 2016 at 15:20, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> Current patch:
>> I've now updated the patch to base it on top of the parallel aggregate
>> patch in [2]. To apply the attached, you must apply [2] first!
>
> ...
>> [2] http://www.postgresql.org/message-id/CAKJS1f9Tr-+9aKWZ1XuHUnEJZ7GKKfo45b+4fCNj8DkrDZYK4g@mail.gmail.com
>
> The attached fixes a small thinko in apply_partialaggref_nodes().

After looking at the parallel aggregate patch, I also looked at this
one, as it's naturally related. Sadly I haven't found any issue that I
could nag about ;-) The patch seems well baked, as it was in the oven
for quite a long time already.

The one concern I do have is that it only adds (de)serialize functions
for SUM(numeric) and AVG(numeric). I think it's reasonable not to
include that into the patch, but it will look a bit silly if that's all
that gets into 9.6.

It's true plenty of aggregates is supported out of the box, as they use
fixed-length data types for the aggregate state. But imagine users happy
that their aggregate queries get parallelized, only to find out that
sum(int8) disables that :-/

So I think we should try to support as many of the aggregates using
internal as possible - it seems quite straight-forward to do that at
least for the aggregates using the same internal representation (avg,
sum, var_samp, var_pop, variance, ...). That's 26 aggregates out of the
45 with aggtranstype=INTERNAL.

For the remaining aggregates (jsonb_*. json_*, string_agg, array_agg,
percentile_) it's probably more complicated to serialize the state, and
maybe even impossible to do that correctly (e.g. string_agg accumulates
the strings in the order as received, but partial paths would build
private lists - not sure if this can actually happen in practice).

I think it would be good to actually document which aggregates support
parallel execution, and which don't. Currently the docs don't mention
this at all, and it's tricky for regular users to deduce that as it's
related to the type of the internal state and not to the input types. An
example of that is the SUM(bigint) example mentioned above.

That's actually the one thing I think the patch is missing.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-15 17:42:46 Re: IF (NOT) EXISTS in psql-completion
Previous Message David Steele 2016-03-15 17:38:49 Re: Password identifiers, protocol aging and SCRAM protocol