Join order optimization

From: Christian Beikov <christian(dot)beikov(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Join order optimization
Date: 2023-08-31 08:19:10
Message-ID: 7b36ee6b-495c-4a8e-a131-70150762430e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm from the Hibernate team (Java ORM) and a user recently reported that
a change in our SQL rendering affected his query plans in a bad way.

In short, we decided to model certain constructs in our ORM with "nested
joins" i.e. using parenthesis to express the join order. This is what we
want to model semantically, though there are cases when we could detect
that we don't need the explicit join ordering to get the same semantics.
I expected that the PostgreSQL optimizer can do the same reasoning and
do further join re-ordering to produce an optimal plan, but to my
surprise it seems it isn't capable to do that.

The query we generate right now is of the following structure:

from tbl1 t1
join (tbl2 t2
    left join tbl3 t3 on t3.pkAndFk = t2.pk
    left join tbl4 t4 on t4.pkAndFk = t2.pk
    ...
    left join tbl9 t9 on t9.pkAndFk = t2.pk
) on t2.fk = t1.pk
where t1.indexedColumn = ...

whereas the query we generated before, which is semantically equivalent,
is the following:

from tbl1 t1
join tbl2 t2 on t2.fk = t1.pk
left join tbl3 t3 on t3.pkAndFk = t2.pk
left join tbl4 t4 on t4.pkAndFk = t2.pk
...
left join tbl9 t9 on t9.pkAndFk = t2.pk
where t1.indexedColumn = ...

You can find the full queries in the attachments section of the issue
report from the user: https://hibernate.atlassian.net/browse/HHH-16595

Query_Hibernate5.txt shows the old style query without parenthesis and
Query_Hibernate6.txt shows the new style. You will also find the query
plans for the two queries attached as CSV files.

It almost seems like the PostgreSQL optimizer sees the parenthesis for
join ordering as an optimization fence!?

The user reported that the behavior is reproducible in PostgreSQL
versions 11 and 15. He promised to provide a full reproducer for this
which I am still waiting for, but I'll share it with you as soon as that
was provided if needed.

I think that we can detect that the parenthesis is unnecessary in this
particular case, but ideally PostgreSQL would be able to detect this as
well to plan the optimal join order. Any ideas what is going on here? Is
this a bug or missed optimization in the query optimizer?

I'm a bit worried about what PostgreSQL will produce for queries that
really need the parenthesis for join ordering e.g.

from tbl1 t1
left join (tbl2 t2
    join tbl3 t3 on t3.pkAndFk = t2.pk
    join tbl4 t4 on t4.pkAndFk = t2.pk
    ...
    join tbl9 t9 on t9.pkAndFk = t2.pk
) on t2.fk = t1.pk
where t1.indexedColumn = ...

Thanks for any help.

Christian

Browse pgsql-performance by date

  From Date Subject
Next Message Maxim Boguk 2023-08-31 15:05:41 Re: Index bloat and REINDEX/VACUUM optimization for partial index
Previous Message jayaprabhakar k 2023-08-31 00:42:58 Re: Index bloat and REINDEX/VACUUM optimization for partial index