Re: Eager aggregation, take 3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tender Wang <tndrwang(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2024-08-29 13:02:10
Message-ID: CA+TgmoZoJSg1gdL-c9CEVdvA0s60_krpSwKsdY2j_L1MO0hH-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 28, 2024 at 11:38 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
> I upload EXPLAIN(COSTS ON, ANALYZE) test to [1].
> I ran the same query three times, and I chose the third time result.
> You can check 19_off_explain.out and 19_on_explain.out.

So, in 19_off_explain.out, we got this:

-> Finalize GroupAggregate (cost=666986.48..667015.35
rows=187 width=142) (actual time=272.649..334.318 rows=900 loops=1)
-> Gather Merge (cost=666986.48..667010.21 rows=187
width=142) (actual time=272.644..333.847 rows=901 loops=1)
-> Partial GroupAggregate
(cost=665986.46..665988.60 rows=78 width=142) (actual
time=266.379..267.476 rows=300 loops=3)
-> Sort (cost=665986.46..665986.65
rows=78 width=116) (actual time=266.367..266.583 rows=5081 loops=3)

And in 19_on_explan.out, we got this:

-> Finalize GroupAggregate (cost=666987.03..666989.77
rows=19 width=142) (actual time=285.018..357.374 rows=900 loops=1)
-> Gather Merge (cost=666987.03..666989.25 rows=19
width=142) (actual time=285.000..352.793 rows=15242 loops=1)
-> Sort (cost=665987.01..665987.03 rows=8
width=142) (actual time=273.391..273.580 rows=5081 loops=3)
-> Nested Loop (cost=665918.00..665986.89
rows=8 width=142) (actual time=252.667..269.719 rows=5081 loops=3)
-> Nested Loop
(cost=665917.85..665985.43 rows=8 width=157) (actual
time=252.656..264.755 rows=5413 loops=3)
-> Partial GroupAggregate
(cost=665917.43..665920.10 rows=82 width=150) (actual
time=252.643..255.627 rows=5413 loops=3)
-> Sort
(cost=665917.43..665917.64 rows=82 width=124) (actual
time=252.636..252.927 rows=5413 loops=3)

So, the patch was expected to cause the number of rows passing through
the Gather Merge to decrease from 197 to 19, but actually caused the
number of rows passing through the Gather Merge to increase from 901
to 15242. When the PartialAggregate was positioned at the top of the
join tree, it reduced the number of rows from 5081 to 300; but when it
was pushed down below two joins, it didn't reduce the row count at
all, and the subsequent two joins reduced it by less than 10%.

Now, you could complain about the fact that the Parallel Hash Join
isn't well-estimated here, but my question is: why does the planner
think that the PartialAggregate should go specifically here? In both
plans, the PartialAggregate isn't expected to change the row count.
And if that is true, then it's going to be cheapest to do it at the
point where the joins have reduced the row count to the minimum value.
Here, that would be at the top of the plan tree, where we have only
5081 estimated rows, but instead, the patch chooses to do it as soon
as we have all of the grouping columns, when we. still have 5413 rows.
I don't understand why that path wins on cost, unless it's just that
the paths compare fuzzily the same, in which case it kind of goes to
my earlier point about not really having the statistics to know which
way is actually going to be better.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-08-29 13:03:19 Re: Add contrib/pg_logicalsnapinspect
Previous Message Daniel Gustafsson 2024-08-29 13:00:37 Re: [BUG] Security bugs affected version detected.