Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Date: 2016-06-17 02:03:29
Message-ID: CA+TgmoaAjwaCRBe8w2UbpZD45uQ+TYwbLW7Dzf-e=VAq4m8q2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 16, 2016 at 6:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Meh. I think this is probably telling us that trying to repurpose Aggref
> as the representation of a partial aggregate may have been mistaken.
> Or maybe just that fix_combine_agg_expr was a bad idea. It seems likely
> to me that that could have been dispensed with altogether in return for
> slightly more work in create_grouping_paths, for instance transforming
> the upper-level Aggrefs into something that looks more like
> Aggref(PartialAggref(original-arguments))
> whereupon the pre-existing match logic should be sufficient to replace
> the "PartialAggref(original-arguments)" subtree with a suitable Var
> in the upper aggregation plan node.

I don't mind if you feel the need to refactor this. In David's
original patch, he had a PartialAggref node that basically acted as a
wrapper for an Aggref node, effectively behaving like a flag on the
underlying Aggref. I thought that was ugly and argued for including
the required information in the Aggref itself. Now what you are
proposing here is a bit different and it may work better, but there
are tricky things like how it all deparses in the EXPLAIN output - you
may recall that I fixed a problem in that area a while back. We're
trying to represent a concept that doesn't exist at the SQL level, and
that makes things a bit complicated: if the Aggref's input is a
PartialAggref, will the deparse code be able to match that up to the
correct catalog entry? But if you've got a good idea, have at it.

>> aggtype is one of the fields
>> that gets compared as part of that process, and it won't be the same
>> if you make aggtype mean the result of that particular node rather
>> than the result of the aggregate. nodeAgg.c also uses aggtype for
>> matching purposes; not sure if there is a similar problem there or
>> not.
>
> AFAICS, nodeAgg.c's check on that is not logically necessary: if you have
> identical inputs and the same aggregate function then you should certainly
> expect the same output type. The only reason we're making it is that the
> code originally used equal() to detect identical aggregates, and somebody
> slavishly copied all the comparisons when splitting up that test so as to
> distinguish "identical inputs" from "identical aggregates". I'll reserve
> judgement on whether search_indexed_tlist_for_partial_aggref has any idea
> what it's doing in this regard.

Sure, it's all cargo-cult programming. I don't want to presume to
read David's mind, but I think we were both thinking that minimizing
the amount of tinkering that we did would cut down on the likelihood
of bugs. Now, you've found a bug that got through, so we can either
double down on the design we've already picked, or we can change it.
I'm not set on one approach or the other; I thought what we ended up
was fairly reasonable given the bloated mess that is struct Aggref,
but I've got no deep attachment to it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-17 02:17:56 Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Previous Message Greg Stark 2016-06-17 00:34:26 Re: 10.0