From: | Kaijiang Chen <chenkaijiang(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Pg Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14302: SQL with LIMIT degrades performance seriously |
Date: | 2016-08-31 04:09:50 |
Message-ID: | CAAkGvS9mu4SBi74iX+YfUEQBuotG7_WbpAaCcfyr-383L6N=-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
A suggestion:
In the sql "select * from renren.user_relations where parent_id=846346 order
by user_id LIMIT 10;", planner can query the number of rows (with
parent_id=846346) from the btree index, right?
If so, planner will know that it's better to sort the rows (actually less
than 2000) than to scan 10M rows.
On Wed, Aug 31, 2016 at 12:02 PM, Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
wrote:
> Thank you very much for your quick response!
>
> So I know I have to deal with my own solutions. Fortunately, I got the
> solution with the "WITH" clause:
> with t as (select * from renren.user_relations where parent_id=846346 order
> by user_id)
> select * from t LIMIT 10;
>
> which separate the ORDER BY and LIMIT to avoid the classic planning
> problem.
>
> On Tue, Aug 30, 2016 at 11:45 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Mon, Aug 29, 2016 at 11:48 PM, <chenkaijiang(at)gmail(dot)com> wrote:
>>
>>
>>>
>>> the explain result:
>>>
>>> explain select * from renren.user_relations where parent_id=846346 order
>>> by
>>> user_id limit 10;
>>>
>>> QUERY PLAN
>>> ------------------------------------------------------------
>>> -------------------------------------------------------
>>> Limit (cost=4.57..442.35 rows=10 width=102)
>>> -> Merge Append (cost=4.57..496534.92 rows=11342 width=102)
>>> Sort Key: user_relations.user_id
>>>
>> ...
>>
>>>
>>> It uses the Index Scan using index on user_id, which is very stupid.
>>>
>>
>> This a classic planning problem with ORDER BY...LIMIT. Probably
>> parent_id is correlated with user_id, and if you pick a high value of
>> parent_id then you are implicitly getting high values of user_id. But
>> PostgreSQL doesn't know that, it assumes things with parent_id=846346 are
>> randomly dispersed over the user_id values, and so it will gather 10 of
>> them very quickly by walking the indexes in order.
>>
>>
>>>
>>> If I explain select * from renren.user_relations where parent_id=846346
>>> order by user_id, then it uses the index on parent_id to get records and
>>> then sort it, which is very wise since the number of qualified records is
>>> 1725.
>>>
>>
>> You know it is 1725, but PostgreSQL thinks it is 11342. Is autoanalyze
>> analyzing often enough? Is default_statistics_target high enough?
>> (Although if I'm right about the correlation between parent_id and
>> user_id, then fixing that estimate might still not be enough to fix things).
>>
>>
>>> So, I think the optimizer/planner has a performance bug with LIMIT
>>> clause.
>>>
>>
>>
>> Well, it has to make decisions with the information available to it.
>> That is not really a bug. It is constantly being improved, but will never
>> be perfect.
>>
>> Cheers,
>>
>> Jeff
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2016-08-31 05:01:39 | Re: BUG #14302: SQL with LIMIT degrades performance seriously |
Previous Message | Kaijiang Chen | 2016-08-31 04:02:20 | Re: BUG #14302: SQL with LIMIT degrades performance seriously |