Re: Eager aggregation, take 3

From: Paul George <p(dot)a(dot)george19(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2024-07-07 02:45:32
Message-ID: CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard:

Thanks for reviving this patch and for all of your work on it! Eager
aggregation pushdown will be beneficial for my work and I'm hoping to see
it land.

I was playing around with v9 of the patches and was specifically curious
about this previous statement...

>This patch also makes eager aggregation work with outer joins. With
>outer join, the aggregate cannot be pushed down if any column referenced
>by grouping expressions or aggregate functions is nullable by an outer
>join above the relation to which we want to apply the partiall
>aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
>can easily identify such situations and subsequently refrain from
>pushing down the aggregates.

...and this related comment in eager_aggregate.out:

>-- Ensure aggregation cannot be pushed down to the nullable side

While I'm new to this work and its subtleties, I'm wondering if this is too
broad a condition.

I modified the first test query in eager_aggregate.sql to make it a LEFT
JOIN and eager aggregation indeed did not happen, which is expected based
on the comments upthread.

query:
SET enable_eager_aggregate=ON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON
t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;

plan:
QUERY PLAN
------------------------------------------------------------
GroupAggregate
Output: t1.a, sum(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Right Join
Output: t1.a, t2.c
Hash Cond: (t2.b = t1.b)
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
-> Hash
Output: t1.a, t1.b
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b
(15 rows)

(NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity)

But, it seems that eager aggregation for the query above can be
"replicated" as:

query:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
SELECT b, sum(c) c
FROM eager_agg_t2 t2p
GROUP BY b
) t2 ON t1.b = t2.b
GROUP BY t1.a
ORDER BY t1.a;

The output of both the original query and this one match (and the plans
with eager aggregation and the subquery are nearly identical if you restore
the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does
this mean that there are conditions under which eager aggregation can be
pushed down to the nullable side?

-Paul-

On Sat, Jul 6, 2024 at 4:56 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
> > I spent some time testing this patchset and found a few more issues.
> > ...
>
> > Hence here is the v8 patchset, with fixes for all the above issues.
>
> I found an 'ORDER/GROUP BY expression not found in targetlist' error
> with this patchset, with the query below:
>
> create table t (a boolean);
>
> set enable_eager_aggregate to on;
>
> explain (costs off)
> select min(1) from t t1 left join t t2 on t1.a group by (not (not
> t1.a)), t1.a order by t1.a;
> ERROR: ORDER/GROUP BY expression not found in targetlist
>
> This happens because the two grouping items are actually the same and
> standard_qp_callback would remove one of them. The fully-processed
> groupClause is kept in root->processed_groupClause. However, when
> collecting grouping expressions in create_grouping_expr_infos, we are
> checking parse->groupClause, which is incorrect.
>
> The fix is straightforward: check root->processed_groupClause instead.
>
> Here is a new rebase with this fix.
>
> Thanks
> Richard
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-07-07 05:20:30 tests fail on windows with default git settings
Previous Message Joseph Koshakow 2024-07-06 23:04:38 Re: Remove dependence on integer wrapping