Re: Eager aggregation, take 3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2025-01-16 21:40:20
Message-ID: 1865234.1737063620@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm very sorry for not having had any time to look at this patch
before --- it's been on my radar screen for awhile, but $LIFE has
been rather demanding lately.

Anyway, I've now read through the mail thread and portions of the
v16 patch, and I have to concur with Robert's qualms about whether
this is ready. A few observations:

* The README addition, and the basically identical text in the
commit message, don't even provide a reason to believe that the
transformation is correct let alone that it will result in faster
execution. I don't understand why it's so hard to provide a solid
correctness argument. This work was supposedly based on an academic
paper; surely that paper must have included a correctness proof?
PG might need a few refinements, like being specific about what we
expect from the equality operators. But an EXPLAIN plan is not an
argument.

* As for the performance aspect, we're given

Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)

As far as I can see, this will require aggregation to be performed
across every row of "b", whereas the naive way would have aggregated
across only rows having join partners in "a". If most "b" rows lack
a join partner then this will be far slower than the naive way.
I do see that it can be better if most "b" rows have multiple join
partners, because we'll re-use partial aggregation results instead
of (effectively) recalculating them. But the README text makes it
sound like this is an unconditional win, which is not the right
mindset. (In fact, in this specific example where a.i is presumed
unique, how's it a win at all?)

* I'm also concerned about what happens with aggregates that can have
large partial-aggregation values, such as string_agg(). With the
existing usage of partial aggregation for parallel queries, it's
possible to be confident that there are not many partial-aggregation
values in existence at the same time. I don't think that holds for
pushed-down aggregates: for example, I wouldn't be surprised if the
planner chooses a join plan that requires stuffing all those values
into a hash table, or "materializes" the output of the partial
aggregation step. Do we have logic that will avoid blowing out
memory during such queries?

* I am just as worried as Robert is about the notion of different
paths for the same RelOptInfo having different rowcount estimates.
That is an extremely fundamental violation of basic planner
assumptions. We did bend it for parameterized paths by restating
those assumptions as (from optimizer/README):

To keep cost estimation rules relatively simple, we make an implementation
restriction that all paths for a given relation of the same parameterization
(i.e., the same set of outer relations supplying parameters) must have the
same rowcount estimate. This is justified by insisting that each such path
apply *all* join clauses that are available with the named outer relations.

I don't see any corresponding statement here, and it's not clear
to me that the point has been thought through adequately.

Another aspect that bothers me is that a RelOptInfo is understood
to contain a bunch of paths that all yield the same data (the same
set of columns), and it seems like that might not be the case here.
Certainly partially-aggregated paths will output something different
than unaggregated ones, but mightn't different join orders mutate the
column set even further?

I think that we might be better off building a separate RelOptInfo for
each way of pushing down the aggregates, in order to preserve the
principle that all the paths in any one RelOptInfo have the same
output. This'll mean more RelOptInfos, but not more paths, so
I doubt it adds that much performance overhead.

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> Back in November, I considered the possibility of introducing a
> GroupPathInfo into the Path structure to store the location of the
> partial aggregation as well as the estimated rowcount for this grouped
> path, similar to how ParamPathInfo functions for parameterized paths.
> However, after some exploration, I determined that this was
> unnecessary.

Why did you determine that was unnecessary? The principal function
of ParamPathInfo IMV is to ensure that we use exactly the same
rowcount estimate for all the paths that should have the same
estimate, and that problem seems to exist here as well. If you
don't have a forcing mechanism then paths' estimates will diverge
as a result of things like different roundoff errors in different
join sequences.

Anyway, I agree with Robert that this isn't ready. I don't feel
that I can even review it adequately without a lot better internal
documentation, specifically a clearer statement of what query shapes
the optimization applies to and what's the rationale for the
transformation being correct. The commentary in pathnodes.h for the
new data structures is likewise so skimpy as to be near useless.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-01-16 21:43:49 Re: Trigger more frequent autovacuums of heavy insert tables
Previous Message Noah Misch 2025-01-16 20:52:54 Re: An improvement of ProcessTwoPhaseBuffer logic