Re: Querying with multicolumn index

From: Eric Jiang <eric(at)doublemap(dot)com>
To: Daniel Blanch Bataller <daniel(dot)blanch(dot)bataller(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Eric Clark <eclark(at)doublemap(dot)com>
Subject: Re: Querying with multicolumn index
Date: 2016-12-10 20:15:35
Message-ID: CAOfJSTxCfzoKxZ72nFV-Ow8mSch8b6C8qS6efH_1_Oi7SA7acQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We aren't using any special planner settings - all enable_* options are "on".

On Sat, Dec 10, 2016 at 12:06 AM, Daniel Blanch Bataller
<daniel(dot)blanch(dot)bataller(at)gmail(dot)com> wrote:
> As a quick fix: Have you considered dropping ix_updates_time index?

We do occasionally want to use ix_updates_time, although not very often.

> I’ve been able to reproduce the issue, but with bigger numbers than you. When I dropped ix_updates_time it went much much faster. It uses updates_driver_id_time_idx index instead.
>
> For some reason the planner does not make a good estimation in this case. Can you show us EXPLAIN (ANALYZE, BUFFERS) before and after dropping ix_updates_time index? Can you show us too the output of SHOW shared_buffers; ?

Here's EXPLAIN (ANALYZE, BUFFERS) with the above bad query on a cold cache:

Limit (cost=0.09..0.70 rows=1 width=56) (actual
time=244846.915..244846.915 rows=1 loops=1)
Buffers: shared hit=3999254 read=57831
I/O Timings: read=242139.661
-> Index Scan Backward using ix_updates_time on updates
(cost=0.09..271696.74 rows=442550 width=56) (actual
time=244846.913..244846.913 rows=1 loops=1)
Filter: (driver_id = 30132)
Rows Removed by Filter: 5316811
Buffers: shared hit=3999254 read=57831
I/O Timings: read=242139.661
Total runtime: 244846.946 ms

and after dropping ix_updates_time:

Limit (cost=0.11..0.98 rows=1 width=56) (actual time=2.270..2.271
rows=1 loops=1)
Buffers: shared hit=1 read=4
I/O Timings: read=2.230
-> Index Scan Backward using updates_driver_id_time_idx on updates
(cost=0.11..382307.69 rows=442550 width=56) (actual time=2.270..2.270
rows=1 loops=1)
Index Cond: (driver_id = 30132)
Buffers: shared hit=1 read=4
I/O Timings: read=2.230
Total runtime: 2.305 ms

and `SHOW shared_buffers;`

shared_buffers
----------------
244MB

> I suspect the issue has to do with low shared_buffers configuration and cache misses, and maybe some costs suboptimal configuration I’ll try to find it out, if anyone can enlighten us it will be very welcomed.
>
>
> P.S. Meanwhile If you still need 'time' index, you can create an index using ‘time' and ‘customer_id' in this order.

Did you mean an index on (time, driver_id)? I did:

CREATE INDEX CONCURRENTLY ix_updates_time_driver_id ON updates
("time", driver_id)

but seems like the planner will use it for driver_id having more than
~300k rows:

Limit (cost=0.11..0.79 rows=1 width=56) (actual
time=115.051..115.052 rows=1 loops=1)
Buffers: shared hit=20376
-> Index Scan Backward using ix_updates_time_driver_id on updates
(cost=0.11..302189.90 rows=443924 width=56) (actual
time=115.048..115.048 rows=1 loops=1)
Index Cond: (driver_id = 30132)
Buffers: shared hit=20376
Total runtime: 115.091 ms

It does seem faster than when having an index on just "time", but
still not optimal.

Really appreciate everyone's help!

--
Eric Jiang, DoubleMap
eric(at)doublemap(dot)com | www.doublemap.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2016-12-10 20:34:18 Re: Querying with multicolumn index
Previous Message Daniel Blanch Bataller 2016-12-10 08:06:31 Re: Querying with multicolumn index