Query planner chooses index scan backward instead of better index option

From: Seckin Pulatkan <seckinpulatkan(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query planner chooses index scan backward instead of better index option
Date: 2016-11-14 12:01:27
Message-ID: CAEO+mDiq+Y2gex9O+SfsPJ4ZebBEh3mp2PUZ+5daPYB4L_DiaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

On our production environment (PostgreSQL 9.4.5 on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-4), 64-bit), one of our queries runs very slow, about 5 minutes . We
noticed that it does not use an index that we anticapited it would.

The query is

select booking0_.*
from booking booking0_
where booking0_.customer_id in (
select customer1_.id
from customer customer1_
where lower((customer1_.first_name||'
'||customer1_.last_name)) like '%gatef%'
)
order by booking0_.id desc
limit 30;

We have just over 3.2 million records on booking and customer tables.

1.

QUERY PLAN

2.

Limit (cost=0.86..11549.23 rows=30 width=241) (actual
time=9459.997..279283.497 rows=10 loops=1)

3.

-> Nested Loop Semi Join (cost=0.86..1979391.88 rows=5142
width=241) (actual time=9459.995..279283.482 rows=10 loops=1)

4.

-> Index Scan Backward using pk_booking_id on booking
booking0_ (cost=0.43..522902.65 rows=2964333 width=241) (actual
time=0.043..226812.994 rows=3212711 loops=1)

5.

-> Index Scan using pk_customer_id on customer customer1_
(cost=0.43..0.49 rows=1 width=4) (actual time=0.016..0.016 rows=0
loops=3212711)

6.

Index Cond: (id = booking0_.customer_id)

7.

Filter: (lower((((first_name)::text || ' '::text) ||
(last_name)::text)) ~~ '%gatef%'::text)

8.

Rows Removed by Filter: 1

9.

Planning time: 2.901 ms

10.

Execution time: 279283.646 ms

The index that we expect it to use is

CREATE INDEX idx_customer_name_lower
ON customer
USING gin
(lower((first_name::text || ' '::text) || last_name::text) COLLATE
pg_catalog."default" gin_trgm_ops);

explain (analyze, buffers)
select customer1_.id
from customer customer1_
where lower((customer1_.first_name||' '||customer1_.last_name)) like
'%gatef%';

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on customer customer1_ (cost=2875.87..11087.13 rows=5144
width=4) (actual time=768.692..1571.241 rows=11 loops=1)
Recheck Cond: (lower((((first_name)::text || ' '::text) ||
(last_name)::text)) ~~ '%gatef%'::text)
Heap Blocks: exact=11
Buffers: shared hit=1420 read=23
-> Bitmap Index Scan on idx_customer_name_lower (cost=0.00..2874.59
rows=5144 width=0) (actual time=763.327..763.327 rows=11 loops=1)
Index Cond: (lower((((first_name)::text || ' '::text) ||
(last_name)::text)) ~~ '%gatef%'::text)
Buffers: shared hit=1418 read=14
Planning time: 240.111 ms
Execution time: 1571.403 ms

And then filter with customer_id index on booking table

CREATE INDEX idx_booking_customer_id
ON booking
USING btree
(customer_id);

We have also created an index on booking table for id desc and customer_id

create index concurrently idx_booking_id_desc_customer_id on booking
using btree(id desc, customer_id);

But result was same

1.

QUERY PLAN

2.

Limit (cost=0.86..12223.57 rows=30 width=241) (actual
time=1282.724..197879.302 rows=10 loops=1)

3.

-> Nested Loop Semi Join (cost=0.86..2094972.51 rows=5142
width=241) (actual time=1282.724..197879.292 rows=10 loops=1)

4.

-> Index Scan Backward using pk_booking_id on booking
booking0_ (cost=0.43..525390.04 rows=3212872 width=241) (actual
time=0.012..131563.721 rows=3212879 loops=1)

5.

-> Index Scan using pk_customer_id on customer customer1_
(cost=0.43..0.49 rows=1 width=4) (actual time=0.020..0.020 rows=0
loops=3212879)

6.

Index Cond: (id = booking0_.customer_id)

7.

Filter: (lower((((first_name)::text || ' '::text) ||
(last_name)::text)) ~~ '%gatef%'::text)

8.

Rows Removed by Filter: 1

9.

Planning time: 0.424 ms

10.

Execution time: 197879.348 ms

If we remove "order by id desc" then it uses index that we expect it to
use. But we need that order by clause: with same query we are using a
pagination (offset) if there are more than 30 records.

1.

QUERY PLAN

2.

Limit (cost=2790.29..2968.29 rows=30 width=241) (actual
time=27.932..38.643 rows=10 loops=1)

3.

-> Nested Loop (cost=2790.29..33299.63 rows=5142 width=241)
(actual time=27.931..38.640 rows=10 loops=1)

4.

-> Bitmap Heap Scan on customer customer1_
(cost=2789.86..10997.73 rows=5142 width=4) (actual time=27.046..27.159
rows=11 loops=1)

5.

Recheck Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%gatef%'::text)

6.

Heap Blocks: exact=11

7.

-> Bitmap Index Scan on idx_customer_name_lower
(cost=0.00..2788.57 rows=5142 width=0) (actual time=27.013..27.013
rows=11 loops=1)

8.

Index Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%gatef%'::text)

9.

-> Index Scan using idx_booking_customer_id on booking
booking0_ (cost=0.43..4.33 rows=1 width=241) (actual
time=1.041..1.041 rows=1 loops=11)

10.

Index Cond: (customer_id = customer1_.id)

11.

Planning time: 0.414 ms

12.

Execution time: 38.757 ms

One a different database with 450K records it uses idx_customer_name_lower

"Limit (cost=3982.71..3982.79 rows=30 width=597) (actual time=0.166..0.166
rows=0 loops=1)"
" Buffers: shared hit=10"
" -> Sort (cost=3982.71..3984.49 rows=711 width=597) (actual
time=0.165..0.165 rows=0 loops=1)"
" Sort Key: booking0_.id"
" Sort Method: quicksort Memory: 25kB"
" Buffers: shared hit=10"
" -> Nested Loop (cost=25.94..3961.71 rows=711 width=597) (actual
time=0.159..0.159 rows=0 loops=1)"
" Buffers: shared hit=10"
" -> Bitmap Heap Scan on customer customer1_
(cost=25.52..1133.10 rows=711 width=4) (actual time=0.159..0.159 rows=0
loops=1)"
" Recheck Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%gatef%'::text)"
" Buffers: shared hit=10"
" -> Bitmap Index Scan on idx_customer_name_lower
(cost=0.00..25.34 rows=711 width=0) (actual time=0.157..0.157 rows=0
loops=1)"
" Index Cond: (lower((((first_name)::text || '
'::text) || (last_name)::text)) ~~ '%gatef%'::text)"
" Buffers: shared hit=10"
" -> Index Scan using idx_booking_id_desc_customer_id on
booking booking0_ (cost=0.42..3.97 rows=1 width=597) (never executed)"
" Index Cond: (customer_id = customer1_.id)"
"Planning time: 1.052 ms"
"Execution time: 0.241 ms"

We are using autovacuum but we have also run vacuum analyze on those tables
explicitly. Also every morning vacuum analyze is working on this database.

autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 500
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.1

some configurations we have changed :

random_page_cost = 2.0
cpu_tuple_cost = 0.005
cpu_index_tuple_cost = 0.005
shared_buffers = 4GB
work_mem = 128MB

As history, before gin index, we were using btree index on first_name and
last_name columns and we were searching with 'gatef%', so only find names
start with given parameter. We were not satisfied with OR condition there
(beside we wanted to use a "contains" search), that's why we choose to
create GIN index. Individually, if you search on customer it is really
fast. In our development database with less amount of data, we also saw
query planner choose this index instead of index scan backward but with
more data like on production, it chooses not to use this index.

Would you have any suggestions, that we can improve the execution time of
this query?

Thanks in advance.

Seckin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2016-11-14 16:50:13 Re: Query planner chooses index scan backward instead of better index option
Previous Message Pietro Pugni 2016-11-14 11:45:38 Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment