Re: Query planner chooses index scan backward instead of better index option

From: Seckin Pulatkan <seckinpulatkan(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query planner chooses index scan backward instead of better index option
Date: 2016-11-15 08:44:57
Message-ID: CAEO+mDhmOhZkpY3E+X_sAHk-4p0WOc9-wg10qpfbbqchfkR2YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you, Jeff for your reply.

Yes, we tested with CTE as well but we are using Hibernate to generate the
query and there are some more conditions that can be added if certain
parameters supplied. For my knowledge, Hibernate is still not supporting
CTE structures yet. That's why I will keep this as last resort to convert
it to native query but much appreciated for the info you gave how query
planner is thinking.

explain (analyze, buffers)
with cte as (select booking0_.*
from booking booking0_
where (booking0_.customer_id in (select customer1_.id from customer
customer1_ where (lower((customer1_.first_name||' '||customer1_.last_name))
like '%sahby%')))
)
select * from cte
order by cte.id desc limit 30

QUERY PLAN
------------------------------------------------------------
------------------------------------------------------------
-----------------------------------
Limit (cost=34171.73..34171.80 rows=30 width=1237) (actual
time=321.370..321.371 rows=4 loops=1)
Buffers: shared hit=18 read=1680
CTE cte
-> Nested Loop (cost=3384.39..33967.93 rows=5155 width=241) (actual
time=309.167..321.312 rows=4 loops=1)
Buffers: shared hit=15 read=1680
-> Bitmap Heap Scan on customer customer1_
(cost=3383.96..11612.18 rows=5155 width=4) (actual time=302.196..310.625
rows=4 loops=1)
Recheck Cond: (lower((((first_name)::text || ' '::text) ||
(last_name)::text)) ~~ '%sahby%'::text)
Heap Blocks: exact=3
Buffers: shared hit=5 read=1674
-> Bitmap Index Scan on idx_customer_name_lower
(cost=0.00..3382.67 rows=5155 width=0) (actual time=300.142..300.142 rows=4
loops=1)
Index Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%sahby%'::text)
Buffers: shared hit=5 read=1671
-> Index Scan using idx_booking_customer_id on booking
booking0_ (cost=0.43..4.33 rows=1 width=241) (actual time=2.666..2.667
rows=1 loops=4)
Index Cond: (customer_id = customer1_.id)
Buffers: shared hit=10 read=6
-> Sort (cost=203.80..216.69 rows=5155 width=1237) (actual
time=321.368..321.369 rows=4 loops=1)
Sort Key: cte.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=18 read=1680
-> CTE Scan on cte (cost=0.00..51.55 rows=5155 width=1237)
(actual time=309.173..321.327 rows=4 loops=1)
Buffers: shared hit=15 read=1680
Planning time: 92.501 ms
Execution time: 321.521 ms

I will also share another info.. We have also passenger table, same as
customer regards to this name fields and search but relation is different
then.. Passenger (4.2 million records) has booking_id then the query
planner behaves different. It runs the in clause query first.

explain (analyze, buffers)
select booking0_.*
from booking booking0_
where (booking0_.id in (select p.booking_id from passenger p where
(lower((p.first_name||' '||p.last_name)) like '%sahby%')))
order by booking0_.id desc limit 30

QUERY PLAN
------------------------------------------------------------
------------------------------------------------------------
----------------------------------------
Limit (cost=4871.81..4871.88 rows=30 width=241) (actual
time=91.867..91.868 rows=4 loops=1)
Buffers: shared hit=22 read=1683
-> Sort (cost=4871.81..4872.76 rows=383 width=241) (actual
time=91.866..91.866 rows=4 loops=1)
Sort Key: booking0_.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=22 read=1683
-> Nested Loop (cost=4107.13..4860.49 rows=383 width=241)
(actual time=90.791..91.850 rows=4 loops=1)
Buffers: shared hit=22 read=1683
-> HashAggregate (cost=4106.70..4107.55 rows=170 width=4)
(actual time=86.624..86.627 rows=4 loops=1)
Group Key: p.booking_id
Buffers: shared hit=10 read=1679
-> Bitmap Heap Scan on passenger p
(cost=3366.97..4105.74 rows=383 width=4) (actual time=86.561..86.613 rows=4
loops=1)
Recheck Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%sahby%'::text)
Heap Blocks: exact=4
Buffers: shared hit=10 read=1679
-> Bitmap Index Scan on
idx_passenger_name_lower (cost=0.00..3366.88 rows=383 width=0) (actual
time=80.148..80.148 rows=4 loops=1)
Index Cond: (lower((((first_name)::text ||
' '::text) || (last_name)::text)) ~~ '%sahby%'::text)
Buffers: shared hit=7 read=1678
-> Index Scan using pk_booking_id on booking booking0_
(cost=0.43..4.42 rows=1 width=241) (actual time=1.300..1.301 rows=1 loops=4)
Index Cond: (id = p.booking_id)
Buffers: shared hit=12 read=4
Planning time: 39.774 ms
Execution time: 92.085 ms

Regards,

Seckin

ps: sorry Jeff for double email.

On Mon, Nov 14, 2016 at 7:50 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Nov 14, 2016 at 4:01 AM, Seckin Pulatkan <seckinpulatkan(at)gmail(dot)com
> > wrote:
>
>> Hi,
>>
>> On our production environment (PostgreSQL 9.4.5 on
>> x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
>> 4.8.5-4), 64-bit), one of our queries runs very slow, about 5 minutes . We
>> noticed that it does not use an index that we anticapited it would.
>>
>> The query is
>>
>> select booking0_.*
>> from booking booking0_
>> where booking0_.customer_id in (
>> select customer1_.id
>> from customer customer1_
>> where lower((customer1_.first_name||'
>> '||customer1_.last_name)) like '%gatef%'
>> )
>> order by booking0_.id desc
>> limit 30;
>>
>
>
> It thinks it is going to find 30 rows which meet your condition very
> quickly, so by walking the index backwards it can avoid needing to do a
> sort. But, the rows which meet your sub-select conditions are biased
> towards the front of the index, so in fact it was to walk backwards through
> most of your index before finding 30 eligible rows.
>
> Your best bet is probably to force it into the plan you want by using a
> CTE:
>
> with t as
> (select booking0_.*
> from booking booking0_
> where booking0_.customer_id in (
> select customer1_.id
> from customer customer1_
> where lower((customer1_.first_name||'
> '||customer1_.last_name)) like '%gatef%'
> ) select * from t order by booking0_.id desc limit 30;
>
> Cheers,
>
> Jeff
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Henrik Ekenberg 2016-11-15 13:27:13 Sql Query :: Any advice ?
Previous Message Merlin Moncure 2016-11-14 21:24:03 Re: Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment