Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tavin Cole <tavin(dot)cole(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tavin Cole <tavincole+pgbg(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop
Date: 2021-02-22 19:50:46
Message-ID: CAFj8pRCfTtd3A_btuAGZ7qV8ZMrw6EtWYA92y--kZF4BS2WPTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

po 22. 2. 2021 v 20:31 odesílatel Tavin Cole <tavin(dot)cole(at)gmail(dot)com> napsal:

> On Mon, Feb 22, 2021 at 4:37 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>>
>> You can see lot of underestimations and overestimations
>>
>> Hash Join
>> <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-join> (cost=15,018.84..89,129.05
>> rows=84,049 width=220) (actual time=2,930.751..7,516.315 rows=778,056
>> loops=1)
>>
>> But probably the significant problem is underestimation of aggregation
>>
>> HashAggregate
>> <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#hash-aggregate> (cost=275,174.24..278,605.86
>> rows=343,162 width=487) (actual time=269.775..1,943.691 rows=1,053,462
>> loops=57)
>>
>> The estimated cost is less than in real cost, and then the cost of a
>> nested loop can be cheaper.
>>
>
> Yes this must be the union in the view I mentioned. Both halves of the
> union select from the same table with different where clauses, in such a
> way that together virtually all of the 1+ million rows are returned. But it
> seems the planner estimates that a majority of the rows will be filtered
> out before applying the union.
>
> Now I see that the planner makes the same miscalculation in the v11 plan,
> but at a point in the execution plan where it is harmless.
>
> It seems postgres was never able to successfully estimate this query, but
> in the past we got lucky with the optimization barrier created by the CTEs.
>
> I wonder if this is a situation which the planner might be able to
> successfully estimate in the future, with some improvements, or if it is
> just impossible because the SQL is unoptimized?
>

It is a difficult question - some estimation errors can be fixed - some
not. It depends on data, data models (some data models are unhappy
designed). Sometimes it is better to break a query to some subqueries,
store results to temp tables, do analysis over these temp tables, and then
continue the query. This usually fix estimation errors well, but there is
an overhead of temp tables.

Tomas Vondra is working on expression statistics, and I can say so
estimation has made big progress in the last three years. But the precision
of any estimation will be limited.

> Thanks all for the help.
>
> Best,
> Tavin
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-02-22 20:54:40 BUG #16886: string equivalence predicate causes system slowdown
Previous Message Tavin Cole 2021-02-22 19:31:35 Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop