Re: Eager aggregation, take 3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2025-01-15 14:40:42
Message-ID: CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2025 at 1:58 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> I understand that we're currently quite bad at estimating the number
> of groups after aggregation. In fact, it's not just aggregation
> estimates — we're also bad at join estimates in some cases. This is a
> reality we have to face. Here's what I think: we should be trying our
> best to cost each node type as accurately as possible, and then build
> the upper nodes based on those costs. We should not conclude that,
> because we are unable to accurately cost one node type, we should
> avoid any cost-based optimizations above that node.

Well, I agree with that last sentence, for sure. But I don't think
it's true that the situations with joins and aggregates are
comparable. We are much better able to estimate the number of rows
that will come out of a join than we are to estimate the number of
rows that come out of an aggregate. It's certainly true that in some
cases we get join estimates badly wrong, and I'd like to see us do
better there, but our estimates of the number of distinct values that
exist in a column are the least reliable part of our statistics system
by far.

Also, we look at the underlying statistics for a column variable even
after joins and aggregates and assume (not having any other
information) that the distribution after that operation is likely to
be similar to the distribution before that operation. Consider a table
A with columns x and y. Let's say x is a unique ID and y is a
dependent value with some distribution over a finite range of
possibilities (e.g. a person's age). If we join table A to some other
table B on A.x = B.x and filter out some of the rows via that join,
the distribution of values in column y is likely to be altered. If the
rows are removed at random, the original distribution will prevail,
but often it won't be random and so the distribution will change in a
way we can't predict. However, guessing pre-join distribution of A.y
is still prevails isn't crazy, and it's better than assuming we can
say nothing about the distribution.

But now let's say that after joining to B, we perform an aggregation
operation, computing the minimum value of A.y for each value of B.z. A
this point, we have no usable statistics for either output column. The
result must be unique on B.z, and the distribution of MIN(A.y) is
going to be entirely different from the distribution of B.y. Any
future joins that we perform here will have to be estimated without
any MCVs, which is going to reduce the accuracy of the estimation by a
lot. In summary, the join makes relying on our MCV information less
likely to be accurate, but the aggregate makes it impossible to rely
on our MCV information at all. In terms of the accuracy of our
results, that is a lot worse.

> I believe the HashAggregate node in this plan faces the same problem
> with inaccurate estimates. However, I don't think it's reasonable to
> say that, because we cannot accurately cost the Aggregate node, we
> should disregard considering JOIN_UNIQUE_OUTER/INNER.

Fair point.

> Back in August, I responded to this issue by "Maybe we can run some
> benchmarks first and investigate the regressions discovered on a
> case-by-case basis". In October, I ran the TPC-DS benchmark at scale
> 10 and observed that eager aggregation was applied in 7 queries, with
> no notable regressions. In contrast, Q4 and Q11 showed performance
> improvements of 3–4 times. Please see [1].

I had forgotten about that, and again, fair point, but I'm concerned
that it might not be a broad enough base of queries to test against.
(7 isn't a very large number.)

> Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
> to limit the planning effort for eager aggregation. For more details,
> please see [2].

OK, I missed this, but...

> And I don't think it's correct to say that we create a partially
> grouped rel for every baserel and every joinrel. This patch includes
> a bunch of logic to determine whether it's appropriate to create a
> grouped rel for a base or join rel. Furthermore, with the
> EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
> possible, we will skip it if the grouped paths are considered not
> useful. All of these measures help reduce the number of grouped
> paths as well as the grouped relations in many cases where eager
> aggregation would not help a lot.

...it looks to me like EAGER_AGGREGATE_RATIO is used to set the
RelAggInfo's agg_useful field, which seems like it happens after the
RelOptInfo has already been created. I had been looking for something
that would avoid creating the RelOptInfo in the first place and I
didn't see it.

> Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
> in the planning cost, at least for the 7 queries where eager
> aggregation is applied. As I said in [1], "For the planning time, I
> do not see notable regressions for any of the seven queries". In
> fact, I initially thought that we might consider enabling this by
> default, given the positive benchmark results, but I just couldn't
> summon the courage to do it. Perhaps we should reconsider enabling it
> by default, so users can benefit from the new feature and help
> identify any potential bugs.

If you're going to commit this, I think it would be a good idea to
enable it by default at least for now. If there are problems, it's
better to find out about them sooner rather than later. If they are
minor they can be fixed; if they are major, we can consider whether it
is better to fix them, disable the feature by default, or revert. We
can add an open item to reconsider the default setting during beta.

> > As I said back in October, this seems like mixing together in one
> > RelOptInfo paths that really belong to two different RelOptInfos.
>
> I understand that you said about the design in October where
> "PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
> RelOptInfos", because "it's less clear whether it's fair to compare
> across the two categories". I've shared my thoughts on this in [5].
>
> Furthermore, even if we separate these grouped paths into two
> different RelOptInfos, we still face the issue that "different paths
> of a grouped relation can have very different row counts", and we need
> a way to handle this. One could argue that we can separate the
> grouped paths where partial aggregation is placed at different
> locations into different RelOptInfos, but this would lead to an
> explosion in the number of RelOptInfos for grouped relations as we
> climb up the join tree. I think this is neither realistic nor
> necessary.

It's possible you're right, but it does make me nervous. I do agree
that making the number of RelOptInfos explode would be really bad.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2025-01-15 14:43:46 Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Previous Message Peter Eisentraut 2025-01-15 14:31:12 Re: Index AM API cleanup