From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2024-12-17 00:56:13 |
Message-ID: | CAMbWs49v7SpPqWGAKQU=9xf-=0wKuJ13TOGBWsbqbN85b2TW5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 4, 2024 at 11:38 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > Hmm, currently we only consider grouped aggregation for eager
> > aggregation. For grouped aggregation, the window function's
> > arguments, as well as the PARTITION BY expressions, must appear in the
> > GROUP BY clause. That is to say, the depname column in the first
> > query, or the n column in the second query, will not be aggregated
> > into the partial groups. Instead, they will remain as they are as
> > input for the WindowAgg nodes. It seems to me that this ensures
> > that we're good with window functions. But maybe I'm wrong.
>
> Returning to this point now that I understand what you meant by
> grouped aggregation:
>
> I still don't understand how you expect to be able to evaluate
> functions like LEAD() and LAG() if any form of partial aggregation has
> been done.
In grouped aggregation, the non-aggregate arguments of the window
function must appear in the GROUP BY clause, so they will not be
aggregated into the partial groups. It seems to me that this ensures
that they remain available as valid inputs for the window function.
For the Aggref arguments of the window function, their final values
are calculated in the Finalize Agg node, meaning they, too, are good
to be used as inputs for the window function.
As an example, please consider
create table tbl (a int, b int, c int);
insert into tbl select i%3, i%3, i%3 from generate_series(1,1000)i;
analyze tbl;
explain (verbose, costs off)
select lead(t1.a+sum(t2.b)) over (), sum(t2.c) from
tbl t1 join tbl t2 on t1.b = t2.b group by t1.a;
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg
Output: lead((t1.a + (sum(t2.b)))) OVER (?), (sum(t2.c)), t1.a
-> Finalize HashAggregate
Output: t1.a, sum(t2.b), sum(t2.c)
Group Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.tbl t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.b, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
-> Partial HashAggregate
Output: t2.b, PARTIAL sum(t2.b), PARTIAL sum(t2.c)
Group Key: t2.b
-> Seq Scan on public.tbl t2
Output: t2.a, t2.b, t2.c
(17 rows)
It seems to me that both 't1.a' and 'sum(t2.b)' are valid inputs for
LEAD(), even though we have performed partial aggregation.
Am I missing something?
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-12-17 02:11:05 | Re: Missing initialization steps in --check and --single modes |
Previous Message | Jacob Champion | 2024-12-17 00:51:53 | Re: [PoC] Federated Authn/z with OAUTHBEARER |