From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-01-15 06:58:02 |
Message-ID: | CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 15, 2025 at 12:07 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jan 12, 2025 at 9:04 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > Attached is an updated version of this patch that addresses Jian's
> > review comments, along with some more cosmetic tweaks. I'm going to
> > be looking at this patch again from the point of view of committing
> > it, with the plan to commit it late this week or early next week,
> > barring any further comments or objections.
>
> I feel this is rushed. This is a pretty big patch touching a sensitive
> area of the code. I'm the only senior hacker who has reviewed it, and
> I would say that I've only reviewed it pretty lightly, and that the
> concerns I raised were fairly substantial. I don't think it's
> customary to go from that point to commit after one more patch
> revision. This really deserves to be looked at by multiple senior
> hackers familiar with the planner; or at least by Tom.
Thank you for your input. In fact, there have been several changes
since your last review, as I mentioned in the off-list email.
However, I agree that it would be great if someone else, especially
Tom, could take a look at this patch.
> My core concerns here are still what they were in the first email I
> posted to the thread: it's unclear that the cost model will deliver
> meaningful answers for the grouped rels, and it doesn't seem like
> you've done enough to limit the overhead of the feature.
>
> With regard to the first, I reiterate that we are in general quite bad
> at having meaningful statistics for anything above an aggregate, and
> this patch greatly expands how much of a query could be above an
> aggregate. I felt back in August when I did my first review, and still
> feel now, that when faced with a query where aggregation could be done
> at any of several levels, the chances of picking the right one are not
> much better than random. Why do you think otherwise?
I understand that we're currently quite bad at estimating the number
of groups after aggregation. In fact, it's not just aggregation
estimates — we're also bad at join estimates in some cases. This is a
reality we have to face. Here's what I think: we should be trying our
best to cost each node type as accurately as possible, and then build
the upper nodes based on those costs. We should not conclude that,
because we are unable to accurately cost one node type, we should
avoid any cost-based optimizations above that node.
Actually, performing aggregation before joins is not a new concept;
consider JOIN_UNIQUE_OUTER/INNER, for example:
explain (costs off)
select * from t t1 join t t2 on t1.b = t2.b
where (t1.a, t1.b) in
(select t3.a, t3.b from t t3, t t4 where t3.a > t4.a);
QUERY PLAN
------------------------------------------------------
Hash Join
Hash Cond: ((t2.b = t1.b) AND (t3.a = t1.a))
-> Hash Join
Hash Cond: (t2.b = t3.b)
-> Seq Scan on t t2
-> Hash
-> HashAggregate
Group Key: t3.a, t3.b
-> Nested Loop
Join Filter: (t3.a > t4.a)
-> Seq Scan on t t3
-> Materialize
-> Seq Scan on t t4
-> Hash
-> Seq Scan on t t1
(15 rows)
I believe the HashAggregate node in this plan faces the same problem
with inaccurate estimates. However, I don't think it's reasonable to
say that, because we cannot accurately cost the Aggregate node, we
should disregard considering JOIN_UNIQUE_OUTER/INNER.
Back in August, I responded to this issue by "Maybe we can run some
benchmarks first and investigate the regressions discovered on a
case-by-case basis". In October, I ran the TPC-DS benchmark at scale
10 and observed that eager aggregation was applied in 7 queries, with
no notable regressions. In contrast, Q4 and Q11 showed performance
improvements of 3–4 times. Please see [1].
> With regard to the second, I suggested several lines of thinking back
> in August that could lead to limiting the number of grouped_rels that
> we create, but it doesn't really look like much of anything has
> changed. We're still creating a partially grouped rel for every
> baserel in the query, and every joinrel in the query. I'm not very
> happy with "let's just turn it off by default" as the answer to that
> concern. A lot of people won't enable the feature, which will mean it
> doesn't have much value to our users, and those who do will still see
> a lot of overhead. Maybe I'm wrong, but I bet with some good
> heuristics the planning cost of this could be reduced by an order of
> magnitude or more. If that were done, we could imagine eventually (or
> maybe even immediately) enabling this by default; without that, we
> still have the burden of maintaining this code and keeping it working,
> but almost nobody will benefit.
Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
to limit the planning effort for eager aggregation. For more details,
please see [2].
And I don't think it's correct to say that we create a partially
grouped rel for every baserel and every joinrel. This patch includes
a bunch of logic to determine whether it's appropriate to create a
grouped rel for a base or join rel. Furthermore, with the
EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
possible, we will skip it if the grouped paths are considered not
useful. All of these measures help reduce the number of grouped
paths as well as the grouped relations in many cases where eager
aggregation would not help a lot.
Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
in the planning cost, at least for the 7 queries where eager
aggregation is applied. As I said in [1], "For the planning time, I
do not see notable regressions for any of the seven queries". In
fact, I initially thought that we might consider enabling this by
default, given the positive benchmark results, but I just couldn't
summon the courage to do it. Perhaps we should reconsider enabling it
by default, so users can benefit from the new feature and help
identify any potential bugs.
> + <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
> + <para>
> + Enables or disables the query planner's ability to partially push
> + aggregation past a join, and finalize it once all the relations are
> + joined. The default is <literal>off</literal>.
>
> I'm a bit concerned about the naming here. I feel like we're adding an
> increasing number of planner features with an increasing number of
> disabling GUCs that are all a bit random. I kind of wonder if this
> should be called enable_incremental_aggregate. Maybe that's worse,
> because "eager" is a word we're not using for anything yet, so using
> it here improves greppability and perhaps understandability. On the
> other hand, the aggregate that is pushed down by this feature is
> always partial (I believe) so we still need a finalize step later,
> which means we're aggregating incrementally. There's some nice parity
> with incremental sort, too, perhaps.
As I mentioned in [3], the name "Eager Aggregation" is inherited from
the paper "Eager Aggregation and Lazy Aggregation" [4], from which
many of the ideas in this feature are derived. Personally, I like
this name a lot, but I'm open to other names if others find it
unreasonable.
> +/* The original length and hashtable of a RelInfoList */
> +typedef struct
> +{
> + int savelength;
> + struct HTAB *savehash;
> +} RelInfoListInfo;
>
> Both the comment and the name of the data type are completely meaningless.
Thanks. Will fix the comment and the name for this data type.
> + /*
> + * Try at least sorting the cheapest path and also try
> + * incrementally sorting any path which is partially sorted
> + * already (no need to deal with paths which have presorted
> + * keys when incremental sort is disabled unless it's the
> + * cheapest input path).
> + */
>
> This would be the fifth copy of this comment. It's not entirely this
> patch's fault, of course, but some kind of refactoring or cleanup is
> probably needed here.
Agreed. However, I think it would be better to refactor this in a
separate patch. This issue also exists on master, and I'd prefer to
avoid introducing such refactors in this already large patch.
> + * cheapest_parameterized_paths also always includes the fewest-row
> + * unparameterized path, if there is one, for grouped relations. Different
> + * paths of a grouped relation can have very different row counts, and in some
> + * cases the cheapest-total unparameterized path may not be the one with the
> + * fewest row.
>
> As I said back in October, this seems like mixing together in one
> RelOptInfo paths that really belong to two different RelOptInfos.
I understand that you said about the design in October where
"PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
RelOptInfos", because "it's less clear whether it's fair to compare
across the two categories". I've shared my thoughts on this in [5].
Furthermore, even if we separate these grouped paths into two
different RelOptInfos, we still face the issue that "different paths
of a grouped relation can have very different row counts", and we need
a way to handle this. One could argue that we can separate the
grouped paths where partial aggregation is placed at different
locations into different RelOptInfos, but this would lead to an
explosion in the number of RelOptInfos for grouped relations as we
climb up the join tree. I think this is neither realistic nor
necessary.
[1] https://postgr.es/m/CAMbWs49DrR8Gkp3TUwFJV_1ShtmLzQUq3mOYD+GyF+Y3AmmrFw@mail.gmail.com
[2] https://postgr.es/m/CAMbWs48OS3Z0G5u3fhar1=H_ucmEcUaX0tRUNpcLQxHt=z4Y7w@mail.gmail.com
[3] https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
[4] https://www.vldb.org/conf/1995/P345.PDF
[5] https://postgr.es/m/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2025-01-15 07:11:39 | Re: Virtual generated columns |
Previous Message | John Naylor | 2025-01-15 06:57:26 | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |