GEQO and join_collapse_limit correlation

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: GEQO and join_collapse_limit correlation
Date: 2018-01-05 14:30:25
Message-ID: CAC+AXB3Ug25QUT+hnoZ2bk6w5oFfBbk1qaHu63G1qKm2NO=z1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

We recently had an issue in production. We have queries that are
procedurally generated by an Object/Relational Mapping framework. Some of
these queries are huge, involving over 120 tables.

With the following parameters the planner seemed to be getting very bad
plans for some of these queries (times are from a single execution, but
they are in those orders of magnitude):

----
from_collapse_limit = 14
join_collapse_limit = 14
geqo_threshold = 14
geqo_effort= 5

(cost=14691360.79..81261293.30 rows=6 width=15934)

Planning time: 3859.928 ms
Execution time: 6883365.973 ms
----

If we raise the join_collapse_limit to a really high value the plans are
much better, but (of course) planning time gets worse:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 14
geqo_effort= 5

(cost=379719.44..562997.32 rows=7 width=15934)

Planning time: 7112.416 ms
Execution time: 7.741 ms
----

After some testing in order to lower the planning time we ended bringing
down the GEQO values, and we have the best results with:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 2
geqo_effort= 2

(cost=406427.86..589667.55 rows=6 width=15934)

Planning time: 2721.099 ms
Execution time: 22.728 ms
----

Issues with the join_collapse_limit have been discussed before [1], but
lowering the GEQO values seems counterintuitive based on the documentation
for this parameter [2]: "Setting this value [join_collapse_limit] to
geqo_threshold or more may trigger use of the GEQO planner, resulting in
non-optimal plans."

What we want to know is if this mechanisms are working as intended and we
can follow a similar approach in the future (lower GEQO values), or this is
just a fluke for a corner case.

I have been able to reproduce a similar behaviour, to a much smaller scale,
with the attached scripts in Postgres 10.

[1] https://www.postgresql.org/message-id/25845.1483809942%40sss.pgh.pa.us
[2] https://www.postgresql.org/docs/current/static/runtime-config-query.html

Regards,

Juan José Santamaría

Attachment Content-Type Size
01_create_tables.sql text/plain 756 bytes
02a_query_default.sql text/plain 3.1 KB
02b_query_high_join.sql text/plain 3.1 KB
02c_query_low_geqo.sql text/plain 3.1 KB
03_drop_tables.sql text/plain 250 bytes

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2018-01-05 16:16:17 Re: GEQO and join_collapse_limit correlation
Previous Message Jeff Janes 2018-01-04 20:01:11 Re: primary key hash index