Re: Eager aggregation, take 3

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2025-01-22 06:48:43
Message-ID: CAMbWs4_aji0kME490phz6nTXnPToddUn19OF3rLm1g4TbNkuzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2025 at 1:36 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Thanks for the example. What seems to be happening here is that each
> of the three joins increases the number of rows by a multiple of
> either 166 or 333. Aggregating reduces the number of rows to 3. I am
> not sure that we should be too concerned about this kind of case,
> because I don't think it will be common to have multiple joins that
> dramatically increase the row count. If you did have that, you must
> want to aggregate multiple times. We don't have the code for an
> IntermediateAggregate or CombineAggregate node right now, I believe,
> but in this query it would likely make sense to apply such a step
> after every join; then you'd never have more than three rows.

Haha, I did once think about the concept of multi-stage aggregations
while working on this patch. While testing this patch and trying to
figure out where placing the partial aggregation would bring the most
benefit, I noticed that a potentially effective approach could be
this: every time the row count increases to a certain point as we join
more and more tables, we perform one aggregation to deflate it, and
then wait for it to grow again before deflating it once more.

This approach would require injecting multiple intermediate
aggregation nodes into the path tree, for which we currently lack the
necessary architecture. As a result, I didn't pursue this idea
further. However, I'm really glad you mentioned this approach, though
it's still unclear whether it's a feasible or reasonable idea.

> Honestly, I'm not sure how much we should worry about a case like
> this. I think that if a user is writing queries that use joins to
> vastly inflate the row count and then aggregate the result, perhaps
> they need to think about rewriting the queries. In this instance, it
> feels a bit like the user is emulating multiplication using an
> iterated SUM(), which is probably never going to work out all that
> well.

I don't have much experience with end-user scenarios, so I'm not sure
if it's common to have queries where the row count increases with more
and more tables joined.

> But I bet it's possible to construct an example using only
> row-reducing joins. Let's say we start with 10k rows that aggregate to
> 10 rows; after performing a join, we end up with 9k rows that
> aggregate to 9 rows. So if we partially aggregate first, we have to
> aggregate 1000 extra rows, but if we join first, we have to join 1000
> extra rows. I don't think we can say a priori which will be cheaper,
> but my idea would make the path that partially aggregates after the
> join win unconditionally.

Yeah, this is the concern I raised upthread: the efficiency gained
from a path having a smaller row count may not always justify the high
cost of the path itself, especially as we move higher in the join
tree.

> To be honest, I was quite surprised this was a percentage like 50% or
> 80% and not a multiple like 2 or 5. And I had thought the multiplier
> might even be larger, like 10 or more. The thing is, 50% means we only
> have to form 2-item groups in order to justify aggregating twice.
> Maybe SUM() is cheap enough to justify that treatment, but a more
> expensive aggregate might not be, especially things like string_agg()
> or array_agg() where aggregation creates bigger objects.

Hmm, if I understand correctly, the "percentage" and the "multiple"
work in the same way. Percentage 50% and multiple 2 both mean that
the average group size is 2, and percentage 90% and multiple 10 both
mean that the average group size is 10. In general, this relationship
should hold: percentage = 1 - 1/multiple. However, I might not have
grasped your point correctly.

> Another thing to consider is that when the number of groups is small
> enough that we don't need to do a Sort+GroupAggregate, it doesn't seem
> so bad to perform marginally-useful partial aggregation, but sometimes
> that won't be the case. For example, imagine that the user wants to
> join orders to order_lines and then compute SUM(order_lines.quantity)
> for each orders.customer_id. If the size of the order_lines tables is
> large relative to work_mem, we're going to need to sort it in order
> to partially aggregate, which is expensive. If it turns out that the
> orders table is also quite big, then maybe we'll end up performing a
> merge join and the same sort order can be used for both operations,
> but if not, we could've just done a hash join with orders as the build
> table. In that kind of case, partial aggregation has to save quite a
> lot to justify itself.
>
> Now, maybe we shouldn't worry about that when applying this heuristic
> cutoff; after all, it's the job of the cost model to understand that
> sorting is expensive, and this cutoff should just be there to make
> sure we don't even try the cost model in cases where it's clearly
> unpromising. But I do suspect that in queries where the average group
> size is 2, this will often be a marginal technique. In addition to the
> problems already mentioned, it could be that the average group size is
> 2 but a lot of groups are actually of size 1 and then there are some
> larger groups. In such cases I'm even less sure that the partial
> aggregation technique will be a winner. Building many 1-element groups
> sounds inefficient.

Yeah, as you summarized, this heuristic is primarily used to discard
unpromising paths, ensuring they aren't considered further. For the
paths that pass this heuristic, the cost model will then determine the
appropriate aggregation and join methods. If we take this into
consideration when applying the heuristic, it seems to me that we
would essentially be duplicating the work that the cost model
performs, which doesn't seem necessary.

I think you are right that in cases where a lot of groups are actually
of size 1 and then there are some larger groups, the partial
aggregation may not be a win. Perhaps we can do better in this if we
have the techniques to estimate the distribution of data across
different groups or to predict how skewed the data might be. It seems
that we don't have such techniques at the moment. This also reminds
me of a similar challenge when calculating the startup cost of
incremental sort. I looked into cost_incremental_sort() and found
that we're currently using the average group size to estimate the
startup cost (please correct me if I'm wrong).

group_tuples = input_tuples / input_groups;

I think this may also suffer from data skew across different groups.
With the mentioned techniques, I believe we could improve the cost
estimation for incremental sort as well.

If I understand correctly, your main concern is the threshold being
set to 2, rather than the heuristic itself, right? Do you think
increasing this threshold to 10 or a larger value would help mitigate
the issue?

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Peter Smith 2025-01-22 06:47:06 Re: Pgoutput not capturing the generated columns