Re: Eager aggregation, take 3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2024-11-01 12:41:51
Message-ID: CA+TgmobVtypTbaxGA2nUQO=Oa=buOJDRG9fHzdsWT08ygBkNAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 1, 2024 at 2:21 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> ... an aggregated row from the partial
> aggregation matches the other side of the join if and only if each row
> in the partial group does, thereby ensuring that all rows in the same
> partial group have the same 'destiny'.

Ah, I really like this turn of phrase! I think it's clearer and
simpler than what I said. And I think it implies that we don't need to
explicitly deduce surrogate grouping keys. For example if we have A
JOIN B JOIN C JOIN D JOIN E JOIN F, grouped by columns from A, we
don't need to work out surrogate grouping keys for B and then C and
then D and then E and then F. We can just look at F's join clauses and
that tells us how to group, independent of anything else.

But is there any hole in that approach? I think the question is
whether the current row could be used in some way that doesn't show up
in the join clauses. I can't think of any way for that to happen,
really. I believe that any outerjoin-delayed quals will show up as
join clauses, and stuff like ORDER BY and HAVING will happen after the
aggregation (at least logically) so it should be fine. Windowed
functions and ordered aggregates may be a blocker to the optimization,
though: if the window function needs access to the unaggregated rows,
or even just needs to know how many rows there are, then we'd better
not aggregate them before the window function runs; and if the
aggregate is ordered, we can only partially aggregate the data if it
is already ordered in a way that is compatible with the final, desired
ordering. Another case we might need to watch out for is RLS. RLS
wants to apply all the security quals before any non-leakproof
functions, and pushing down the aggregation might push an aggregate
function past security quals. Perhaps there are other cases to worry
about as well; this is all I can think of at the moment.

But regardless of those kinds of cases, the basic idea that we want
the partially aggregate rows to join if and only if the unaggregated
rows would have joined seems exactly correct to me, and that provides
theoretical justification for deriving the surrogate grouping key
directly from the join quals. Woot!

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Torsten Förtsch 2024-11-01 12:42:55 Re: Allowing pg_recvlogical to create temporary replication slots
Previous Message Alastair Turner 2024-11-01 12:12:35 Re: Count and log pages set all-frozen by vacuum