Re: Adding LIMIT changes PostgreSQL plan from good to a bad one

From: Michael Korbakov <rmihael(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Adding LIMIT changes PostgreSQL plan from good to a bad one
Date: 2019-11-21 21:04:28
Message-ID: CACLnFeEoZCi5+R_xcrSTAjZKKMZR4+_aVNehqZeAkbWY57SxMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On November 21, 2019 at 19:14:33, Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com)
wrote:

čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmihael(at)gmail(dot)com>
napsal:

> Hi everybody.
>
> I stumbled upon a weird problem with the query planner. I have a query
> on a typical EAV schema:
>
> SELECT contacts.id
> FROM contacts
> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
> FROM contacts_values
> WHERE contacts_values.field_id =
> '\x000000000000000000000000'
> AND contacts_values.field_name = 'facebook'
> AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
>
> nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara'))
> AS anon_1
> ON anon_1.company_id = contacts.company_id AND
> anon_1.id = contacts.id
> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
> FROM contacts_values
> WHERE contacts_values.field_id =
> '\x000000000000000000000000'
> AND contacts_values.field_name = 'last_name'
> AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
> nimble_contact_value_normalize('Ferrara')) AS
> anon_2
> ON anon_2.company_id = contacts.company_id AND
> anon_2.id = contacts.id
> JOIN contacts__aggregated AS contacts__aggregated_1
> ON contacts__aggregated_1.company_id = contacts.company_id AND
> contacts__aggregated_1.contact_id = contacts.id AND
> contacts__aggregated_1.field_name = 'names'
> WHERE contacts.company_id = '\x4c2118ad54397f271b000000'
> AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL)
> ORDER BY contacts__aggregated_1.value ASC
> LIMIT 30 OFFSET 0;
>
> My problem is that the LIMIT clause in this query makes the planner
> choose a bad plan
> with nested loops: https://explain.depesz.com/s/Mute. Running the same
> query after
> SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan:
> https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query
> results in a
> similar plan: https://explain.depesz.com/s/wDqE.
>
> One thing that concerns me is that the cost of the LIMIT node in the
> bad nested loop
> plan is just a fraction of the cost of its subnode. But for the better
> merge join
> plan LIMIT node has the same cost as its subnode. How could it be this
> way? And what
> can I do to make the planner pick up a better plan?
>
> We are running PostgreSQL 10.10.
>

Sometimes are problems with LIMIT clause, because it too much decrease
costs. The system expects so necessary values are found quickly - but if
this premise is not valid, then this plan can be bad.

LIMIT node with nested loop decreases the cost proportionally to requested
number of rows versus estimated number. That may be too aggressive, but the
same LIMIT node with merge join does not decrease total cost at all. I do
not understand why is it happening.

typical solution is wrapping to subquery and using OFFSET 0 (that is
optimizer fence)

SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed

SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10;

Unfortunately, that has not worked for me (https://explain.depesz.com/s/slsM).
Looks like OFFSET 0 is not working as a fence. However, I managed to get
some success with wrapping everything except LIMIT into a CTE:
https://explain.depesz.com/s/n7c4.

But you can see in explain very bad estimations - left join is estimated to
918K rows and result is just 83 rows

There is relative very high rows removed in top part of query "Rows Removed
by Filter: 1043891"

It's EAV table - it is unoptimized pattern :/

Is there any way to tune planner to choose better plan for such queries? I
tried increasing default_statistics_target to 1000 and creating extended
statistics: CREATE STATISTICS contacts_values_company_id_field
(dependencies) ON company_id, field_id, field_name FROM contacts_values.
After running ANALYZE on all relevant tables I noticed no changes in
planner's behavior.

Pavel

> Sincerely,
>
> -- Michael Korbakov
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jason L. Amerson 2019-11-21 21:05:34 RE: Remote Connection Help
Previous Message Jason L. Amerson 2019-11-21 20:52:52 RE: Remote Connection Help