Re: PoC: Grouped base relation

From: Antonin Houska <ah(at)cybertec(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PoC: Grouped base relation
Date: 2017-01-17 16:37:10
Message-ID: 4410.1484671030@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> [... snip ]]
>
> This all works well, as long as the aggregate is "summing" something
> across rows. The method doesn't work when aggregation is say
> "multiplying" across the rows or "concatenating" across the rows like
> array_agg() or string_agg(). They need a different strategy to combine
> aggregates across relations.

Good point. The common characteristic of these seems to be that thay don't
have aggcombinefn defined.

> IIUC, we are trying to solve multiple problems here:

> 1. Pushing down aggregates/groups down join tree, so that the number of rows
> to be joined decreases. This might be a good optimization to have. However
> there are problems in the current patch. Every path built for a relation
> (join or base) returns the same result expressed by the relation or its
> subset restricted by parameterization or unification. But this patch changes
> that. It creates paths which represent grouping in the base relation. I
> think, we need a separate relation to represent that result and hold paths
> which produce that result. That itself would be a sizable patch.

Whether a separate relation (RelOptInfo) should be created for grouped
relation is an important design decision indeed. More important than your
argument about the same result ("partial path", used to implement parallel
nodes actually does not fit this criterion perfectly - it only returns part of
the set) is the fact that the data type (target) differs.

I even spent some time coding a prototype where separate RelOptInfo is created
for the grouped relation but it was much more invasive. In particular, if only
some relations are grouped, it's hard to join them with non-grouped ones w/o
changing make_rel_from_joinlist and subroutines substantially. (Decision
whether the plain or the grouped relation should be involved in joining makes
little sense at the leaf level of the join tree.)

So I took the approach that resembles the partial paths - separate pathlists
within the same RelOptInfo.

> 2. Try to push down aggregates based on the equivalence classes, where
> grouping properties can be transferred from one relation to the other using
> EC mechanism.

I don't think the EC part should increase the patch complexity a lot. Unless I
missed something, it's rather isolated to the part where target of the grouped
paths is assembled. And I think it's important even for initial version of the
patch.

> This seems to require solving the problem of combining aggregates across the
> relations. But there might be some usecases which could benefit without
> solving this problem.

If "combining aggregates ..." refers to joining grouped relations, then I
insist on doing this in the initial version of the new feature too. Otherwise
it'd only work if exactly one base relation of the query is grouped.

> 3. If the relation to which we push the aggregate is an append relation,
> push (partial) aggregation/grouping down into the child relations. - We
> don't do that right now even for grouping aggregation on a single append
> table. Parallel partial aggregation does that, but not exactly per
> relation. That may be a sizable project in itself. Even without this piece
> the rest of the optimizations proposed by this patch are important.

Yes, this can be done in a separate patch. I'll consider it.

> 4. Additional goal: push down the aggregation to any relation (join/base)
> where it can be computed.

I think this can be achieved by adding extra aggregation nodes to the join
tree. As I still anticipate more important design changes, this part is not at
the top of my TODO list.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-01-17 16:41:54 Re: Logical Replication WIP
Previous Message Peter Eisentraut 2017-01-17 16:34:41 Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal