Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tavin Cole <tavin(dot)cole(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, tavincole+pgbg(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop
Date: 2021-02-22 14:21:21
Message-ID: CAFj8pRA_j9z0KHOq==x85NKTokfOQ1MdtYhSvR95Bv7oK63VPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

po 22. 2. 2021 v 15:16 odesílatel Tavin Cole <tavin(dot)cole(at)gmail(dot)com> napsal:

> On Mon, Feb 22, 2021 at 11:56 AM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>>
>> You'll have to post your actual query to get proper feedback.
>>
>> That said, a typical case for this specifically when going to version
>> 12 would be that CTEs are no longer optimization barriers. If you do
>> use CTEs in your query, try changing them from "WITH x AS (...)" to
>> "WITH x AS MATERIALIZED (...)" to return to the previous behaviour and
>> see if that improves things.
>>
>
> I don't know if I will be allowed to publicly post the actual query. I can
> say it's quite complicated, and involves the following features:
>
> * an inner SELECT DISTINCT from a particular primary table and some joins
> * a join to the same primary table in the outer query, among others
> * a join to a view in the outer query
> * said view uses CTEs like you say, though based on unrelated tables
> * then said view makes a UNION of 2 different SELECTs
> * each of these again joins to the same primary table
>
> So I'm not surprised the query planner gets confused. And your trick
> works. Materializing those CTEs makes the query planner choose the hash
> join again.
>
> I still think there is a bug here though. The query planner is getting
> something really wrong, and at least now we know it's a consequence of
> trying to optimize through the CTEs.
>
> I will see what I can do about sharing the actual query. If this is
> interesting to a pg developer, you're welcome to contact me.
>

The optimizer is controlled by estimations. With bad estimations, then the
result of the planner is random. Sometimes it is better, sometimes it is
worse. Materialized CTE works as an optimization barrier. In these cases it
can help - and in others the using materialized CTE can be worse.

you don't need share query - just send an execution plan -
https://explain.depesz.com/ can do anonymization of the execution plan, if
it is necessary.

Regards

Pavel

> Best,
> Tavin
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tavin Cole 2021-02-22 14:25:40 Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop
Previous Message Tavin Cole 2021-02-22 14:12:55 Re: BUG #16885: PostgreSQL upgrade to 12+ changes hash join to slow nested loop