Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

From: Tavin Cole <tavin(dot)cole(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tavin Cole <tavincole+pgbg(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop
Date: 2021-02-22 19:31:35
Message-ID: CAP93hfz_OrLmfoxyLHwCCzs4Poiq4MZjNA1xeC6GsYzhrrx1SQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

>
> You can see lot of underestimations and overestimations
>
> Hash Join
> <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-join> (cost=15,018.84..89,129.05
> rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056
> loops=1)
>
> But probably the significant problem is underestimation of aggregation
>
> HashAggregate
> <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-aggregate> (cost=275,174.24..278,605.86
> rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462
> loops=57)
>
> The estimated cost is less than in real cost, and then the cost of a
> nested loop can be cheaper.
>

Yes this must be the union in the view I mentioned. Both halves of the
union select from the same table with different where clauses, in such a
way that together virtually all of the 1+ million rows are returned. But it
seems the planner estimates that a majority of the rows will be filtered
out before applying the union.

Now I see that the planner makes the same miscalculation in the v11 plan,
but at a point in the execution plan where it is harmless.

It seems postgres was never able to successfully estimate this query, but
in the past we got lucky with the optimization barrier created by the CTEs.

I wonder if this is a situation which the planner might be able to
successfully estimate in the future, with some improvements, or if it is
just impossible because the SQL is unoptimized?

Thanks all for the help.

Best,
Tavin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2021-02-22 19:50:46 Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop
Previous Message Pavel Stehule 2021-02-22 15:37:01 Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop