From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-01-21 14:13:14 |
Message-ID: | CAMbWs48t2DxTKfz2-seyYxqayxnAo8b+5LV3nuPDk9gSqgLy2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 21, 2025 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, a partial-aggregation path does not generate the same data
> as an unaggregated path, no matter how fuzzy you are willing to be
> about the concept. So I'm having a very hard time accepting that
> it ought to be part of the same RelOptInfo, and thus I don't really
> buy that annotating paths with a GroupPathInfo is the way forward.
Agreed. I think one point I failed to make myself clear on is that
I've never intended to put a partial-aggregation path and an
unaggregated path into the same RelOptInfo. One of the basic designs
of this patch is that partial-aggregation paths are placed in a
separate category of RelOptInfos, which I call "grouped relations"
(though I admit that's not the best name). This ensures that we never
compare a partial-aggregation path with an unaggregated path during
scan/join planning, because I am certain that the two categories of
paths are not comparable.
Regarding the GroupPathInfo proposal, my intention is to add a valid
GroupPathInfo only for the partial-aggregation paths. The goal is to
ensure that partial-aggregation paths within this category are
compared only if their partial aggregations are at the same location.
To be honest, I still doubt that this is necessary. I have two main
reasons for this.
1.
For a partial-aggregation path, the location where we place the
partial aggregation does not impose any restrictions on further
planning. This is different from the parameterized path case. If two
parameterized paths are equal on very other figure of merit, we will
choose the one with fewer required outer rels, as it means fewer join
restrictions on upper planning. However, for partial-aggregation
paths, we do not have a preference regarding the location of the
partial aggregation. For instance, for path "A JOIN PartialAgg(B)
JOIN C" and path "PartialAgg(A JOIN B) JOIN C", if one path dominates
the other on every figure of merit, it seems to me that there's no
point in keeping the less favorable one, although they have their
partial aggregations at different join levels.
2.
A partial-aggregation path of a rel essentially yields an aggregated
form of that rel's row set. The difference between the row sets
yielded by paths with different locations of partial aggregation is
primarily about the different degrees to which the rows are
aggregated. These sets are fundamentally homogeneous.
In summary, in my own opinion, I think the partial-aggregation paths
of the same "grouped relation" are comparable, regardless of the
position of the partial aggregation within the path tree. So I think
we should put them into the same RelOptInfo.
Of course, I could be very wrong about this. I would greatly
appreciate hearing others' thoughts on this.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Dagfinn Ilmari Mannsåker | 2025-01-21 14:17:03 | Re: pg_createsubscriber TAP test wrapping makes command options hard to read. |
Previous Message | Álvaro Herrera | 2025-01-21 13:55:47 | Re: Bug in detaching a partition with a foreign key. |