Re: SV: bad plan using nested loops

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Johan Fredriksson <eskil(at)kth(dot)se>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: SV: bad plan using nested loops
Date: 2018-02-05 19:15:02
Message-ID: b39edb72-258e-16ac-6f2f-4ae08bddddb7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/02/2018 10:02 AM, Johan Fredriksson wrote:
> tor 2018-02-01 klockan 20:34 +0000 skrev Johan Fredriksson:
>>> Johan Fredriksson <eskil(at)kth(dot)se> writes:
>>>> Bad plan: https://explain.depesz.com/s/avtZ
>>>> Good plan: https://explain.depesz.com/s/SJSt
>>>> Any suggestions on how to make the planner make better decisions
>>>> for
>>>> this query?
>>>
>>> Core of the problem looks to be the misestimation here:
>>>
>>>        Index Only Scan using shredder_cgm1 on
>>> public.cachedgroupmembers cachedgroupmembers_4
>>> (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903
>>> rows=1492 loops=804)
>>>          Output: cachedgroupmembers_4.memberid,
>>> cachedgroupmembers_4.groupid,
>>> cachedgroupmembers_4.disabled
>>>          Index Cond: ((cachedgroupmembers_4.memberid =
>>> principals_1.id) AND
>>> (cachedgroupmembers_4.disabled = 0))
>>>          Heap Fetches: 5018
>>>
>>> Probably, memberid and disabled are correlated but the planner
>>> doesn't
>>> know that, so it thinks the index condition is way more selective
>>> than it
>>> actually is.  In PG 10, you could very possibly fix that by
>>> installing
>>> extended statistics on that pair of columns.  See
>>>
>>> https://www.postgresql.org/docs/current/static/planner-stats.html#P
>>> LANNER-STATS-EXTENDED
>>
>> I'm not sure what you mean by correlated, but there are only a
>> handful (164 when I check it) disabled groupmembers out of total 7.5
>> million.
>> I'll give CREATE STATISTICS on those columns a shot and see if it
>> gets any better.
>
> It looks like you are right, Tom. There actually exists full
> correlation between memberid, groupid and disabled.
>
> rt4=# SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext;
>  stxname  | stxkeys |   stxdependencies    
> -----------+---------+----------------------
>  cgm_stat2 | 2 6     | {"2
> => 6": 1.000000}
>  cgm_stat1 | 3 6     | {"3 => 6": 1.000000}
> (2 rows)
>
> However, this does not help the planner. It still picks the bad plan.
>

Yeah :-( Unfortunately, we're not using the extended statistics to
improve join cardinality estimates yet. PostgreSQL 10 can only use them
to improve estimates on individual tables, and judging by the progress
on already submitted improvements, it doesn't seem very likely to change
in PostgreSQL 11.

regards
Tomas

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-02-05 19:19:22 Re: postgresql 10.1 wrong plan in when using partitions bug
Previous Message Andreas Kretschmer 2018-02-05 16:33:27 Re: OT: Performance of VM