Re: Eager aggregation, take 3

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2024-10-06 03:29:58
Message-ID: CAMbWs49DrR8Gkp3TUwFJV_1ShtmLzQUq3mOYD+GyF+Y3AmmrFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 5, 2024 at 6:23 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > Here is an updated version of this patch that fixes the rowcount
> > estimate issue along this routine. (see set_joinpath_size.)
>
> I have worked on inventing some heuristics to limit the planning
> effort of eager aggregation. One simple yet effective approach I'm
> thinking of is to consider a grouped path as NOT useful if its row
> reduction ratio falls below a predefined minimum threshold. Currently
> I'm using 0.5 as the threshold, but I'm open to other values.

I ran the TPC-DS benchmark at scale 10 and observed eager aggregation
applied in several queries, including q4, q8, q11, q23, q31, q33, and
q77. Notably, the regression in q19 that Tender identified with v11
has disappeared in v13.

Here’s a comparison of Execution Time and Planning Time for the seven
queries with eager aggregation disabled versus enabled (best of 3).

Execution Time:

EAGER-AGG-OFF EAGER-AGG-ON

q4 105787.963 ms 34807.938 ms

q8 1407.454 ms 1654.923 ms

q11 67899.213 ms 18670.086 ms

q23 45945.849 ms 42990.652 ms

q31 10463.536 ms 10244.175 ms

q33 2186.928 ms 2217.228 ms

q77 2360.565 ms 2416.674 ms

Planning Time:

EAGER-AGG-OFF EAGER-AGG-ON

q4 2.334 ms 2.602 ms

q8 0.685 ms 0.647 ms

q11 0.935 ms 1.094 ms

q23 2.666 ms 2.582 ms

q31 1.051 ms 1.206 ms

q33 1.248 ms 1.796 ms

q77 0.967 ms 0.962 ms

There are good performance improvements in q4 and q11 (3~4 times).
For the other queries, execution times remain largely unchanged,
falling within the margin of error, with no notable regressions
observed.

For the planning time, I do not see notable regressions for any of the
seven queries.

It seems that the new cost estimates and the new heuristic are working
pretty well.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-10-06 03:53:59 Re: GetRelationPath() vs critical sections
Previous Message David Rowley 2024-10-05 23:26:14 Re: On disable_cost