Re: Adding LIMIT changes PostgreSQL plan from good to a bad one

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Michael Korbakov <rmihael(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Adding LIMIT changes PostgreSQL plan from good to a bad one
Date: 2019-11-21 17:13:56
Message-ID: CAFj8pRC2iP5qhxbDpA7c1RfksQPRHwXoxk96+=k=P1KKF7EDAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmihael(at)gmail(dot)com>
napsal:

> Hi everybody.
>
> I stumbled upon a weird problem with the query planner. I have a query
> on a typical EAV schema:
>
> SELECT contacts.id
> FROM contacts
> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
> FROM contacts_values
> WHERE contacts_values.field_id =
> '\x000000000000000000000000'
> AND contacts_values.field_name = 'facebook'
> AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
>
> nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara'))
> AS anon_1
> ON anon_1.company_id = contacts.company_id AND
> anon_1.id = contacts.id
> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
> FROM contacts_values
> WHERE contacts_values.field_id =
> '\x000000000000000000000000'
> AND contacts_values.field_name = 'last_name'
> AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
> nimble_contact_value_normalize('Ferrara')) AS
> anon_2
> ON anon_2.company_id = contacts.company_id AND
> anon_2.id = contacts.id
> JOIN contacts__aggregated AS contacts__aggregated_1
> ON contacts__aggregated_1.company_id = contacts.company_id AND
> contacts__aggregated_1.contact_id = contacts.id AND
> contacts__aggregated_1.field_name = 'names'
> WHERE contacts.company_id = '\x4c2118ad54397f271b000000'
> AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL)
> ORDER BY contacts__aggregated_1.value ASC
> LIMIT 30 OFFSET 0;
>
> My problem is that the LIMIT clause in this query makes the planner
> choose a bad plan
> with nested loops: https://explain.depesz.com/s/Mute. Running the same
> query after
> SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan:
> https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query
> results in a
> similar plan: https://explain.depesz.com/s/wDqE.
>
> One thing that concerns me is that the cost of the LIMIT node in the
> bad nested loop
> plan is just a fraction of the cost of its subnode. But for the better
> merge join
> plan LIMIT node has the same cost as its subnode. How could it be this
> way? And what
> can I do to make the planner pick up a better plan?
>
> We are running PostgreSQL 10.10.
>

Sometimes are problems with LIMIT clause, because it too much decrease
costs. The system expects so necessary values are found quickly - but if
this premise is not valid, then this plan can be bad.

typical solution is wrapping to subquery and using OFFSET 0 (that is
optimizer fence)

SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed

SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10;

But you can see in explain very bad estimations - left join is estimated to
918K rows and result is just 83 rows

There is relative very high rows removed in top part of query "Rows Removed
by Filter: 1043891"

It's EAV table - it is unoptimized pattern :/

Pavel

> Sincerely,
>
> -- Michael Korbakov
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jason L. Amerson 2019-11-21 17:44:22 RE: Remote Connection Help
Previous Message Israel Brewster 2019-11-21 17:09:31 Re: Tablespace setup issue