Re: [HACKERS] WIP: Aggregation push-down

From: Richard Guo <riguo(at)pivotal(dot)io>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2019-07-15 10:28:15
Message-ID: CAN_9JTzAsBwv5zr4SRVO35OtMWQBXSLcUuY65NQ0ZtFCTOrMMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2019 at 4:42 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Richard Guo <riguo(at)pivotal(dot)io> wrote:
>
> > I didn't fully follow the whole thread and mainly looked into the
> > latest
> > patch set. So what are the considerations for abandoning the
> > aggmultifn
> > concept?
>
> Originally the function was there to support join where both relations are
> grouped. My doubts about usefulness of such join started at [1]. (See the
> thread referenced from [2].)
>
> > In my opinion, aggmultifn would enable us to do a lot more
> > types of transformation. For example, consider the query below:
> >
> > select sum(foo.c) from foo join bar on foo.b = bar.b group by foo.a,
> > bar.a;
> >
> > With the latest patch, the plan looks like:
> >
> > Finalize HashAggregate <------ sum(psum)
> > Group Key: foo.a, bar.a
> > -> Hash Join
> > Hash Cond: (bar.b = foo.b)
> > -> Seq Scan on bar
> > -> Hash
> > -> Partial HashAggregate <------ sum(foo.c) as
> > psum
> > Group Key: foo.a, foo.b
> > -> Seq Scan on foo
> >
> >
> > If we have aggmultifn, we can perform the query this way:
> >
> > Finalize HashAggregate <------ sum(foo.c)*cnt
> > Group Key: foo.a, bar.a
> > -> Hash Join
> > Hash Cond: (foo.b = bar.b)
> > -> Seq Scan on foo
> > -> Hash
> > -> Partial HashAggregate <------ count(*) as cnt
> > Group Key: bar.a, bar.b
> > -> Seq Scan on bar
>
> Perhaps, but it that would require changes to nodeAgg.c, which I want to
> avoid
> for now.
>
> > And this way:
> >
> > Finalize HashAggregate <------ sum(psum)*cnt
> > Group Key: foo.a, bar.a
> > -> Hash Join
> > Hash Cond: (foo.b = bar.b)
> > -> Partial HashAggregate <------ sum(foo.c) as
> > psum
> > Group Key: foo.a, foo.b
> > -> Seq Scan on foo
> > -> Hash
> > -> Partial HashAggregate <------ count(*) as cnt
> > Group Key: bar.a, bar.b
> > -> Seq Scan on bar
>
> This looks like my idea presented in [1], for which there seems to be no
> common use case.
>
> > My another question is in function add_grouped_path(), when creating
> > sorted aggregation path on top of subpath. If the subpath is not
> > sorted,
> > then the sorted aggregation path would not be generated. Why not in
> > this
> > case we create a sort path on top of subpath first and then create
> > group
> > aggregation path on top of the sort path?
>
> I see no reason not to do it. (If you want to try, just go ahead.) However
> the
> current patch version brings only the basic functionality and I'm not
> going to
> add new functionality (such as parallel aggregation, partitioned tables or
> postgres_fdw) until the open design questions are resolved. Otherwise
> there's
> a risk that the additional work will be wasted due to major rework of the
> core
> functionality.
>
> > Core dump when running one query in agg_pushdown.sql
>
> Thanks for the report! Fixed in the new version.
>

Another core dump for query below:

select sum(t1.s1) from t1, t2, t3, t4 where t1.j1 = t2.j2 group by t1.g1,
t2.g2;

This is due to a small mistake:

diff --git a/src/backend/optimizer/util/relnode.c
b/src/backend/optimizer/util/relnode.c
index 10becc0..9c2deed 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -648,7 +648,7 @@ void
add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
{
add_rel_info(root->grouped_rel_list, rel);
- add_rel_info(root->agg_info_list, rel);
+ add_rel_info(root->agg_info_list, agg_info);
}

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Guo 2019-07-15 10:52:01 Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Previous Message Daniel Gustafsson 2019-07-15 10:12:25 Re: POC: converting Lists into arrays