From: | Andreas Joseph Krogh <andreas(at)visena(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Querying with multicolumn index |
Date: | 2016-12-09 17:56:21 |
Message-ID: | VisenaEmail.9.92889c6054ccae11.158e4b88adf@tc7-visena |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
På fredag 09. desember 2016 kl. 18:00:16, skrev Eric Jiang <eric(at)doublemap(dot)com
<mailto:eric(at)doublemap(dot)com>>:
Hi all,
I have a query that I *think* should use a multicolumn index, but
sometimes isn't, resulting in slow queries.
We have a DB that records GPS coordinates for vehicles:
Table "public.updates"
Column | Type | Modifiers
------------+--------------------------+------------------------------------------------------
id | integer | not null default
nextval('updates_id_seq'::regclass)
driver_id | integer | not null
latitude | double precision | not null
longitude | double precision | not null
time | timestamp with time zone | not null default now()
vehicle_id | integer |
Indexes:
"updates_pkey" PRIMARY KEY, btree (id)
"ix_updates_time" btree ("time")
"updates_driver_id_time_idx" btree (driver_id, "time")
"updates_vehicle_id_time_idx" btree (vehicle_id, "time")
Table has about 15M records across 100 distinct driver_id.
I want to get the last record for a specific driver:
SELECT * FROM updates WHERE driver_id=123 ORDER BY "time" DESC LIMIT 1;
For some values of driver_id, it does what I expect and uses
updates_driver_id_time_idx to fetch the records in 2 ms or less. For
other values of driver_id, it does an index scan backwards on
ix_updates_time, taking upwards of 2 minutes.
Good plan:
Limit (cost=0.11..1.38 rows=1 width=56) (actual time=2.710..2.710
rows=1 loops=1)
-> Index Scan Backward using updates_driver_id_time_idx on updates
(cost=0.11..139278.28 rows=110051 width=56) (actual time=2.709..2.709
rows=1 loops=1)
Index Cond: (driver_id = 17127)
Total runtime: 2.732 ms
(4 rows)
Bad plan:
Limit (cost=0.09..0.69 rows=1 width=56) (actual
time=216769.111..216769.112 rows=1 loops=1)
-> Index Scan Backward using ix_updates_time on updates
(cost=0.09..272339.04 rows=448679 width=56) (actual
time=216769.110..216769.110 rows=1 loops=1)
Filter: (driver_id = 30132)
Rows Removed by Filter: 5132087
Total runtime: 216769.174 ms
From cursory testing, the difference seems to be based on how many
total rows there are for a particular driver. The above query uses
updates_driver_id_time_idx for drivers with less than about 300K rows,
but uses ix_updates_time for drivers with more than about 300K rows.
Anything we can do to make it do the "right" thing? We are also
considering denormalizing the data and keeping a "cache" of the same
data in another table.
pgsql version: 9.3.14 and 9.5.3, already tried vacuum analyze.
Thanks,
Eric
You should be having this index:
create index updates_driver_time_idx ON updates(driver_id, "time" DESC);
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>
From | Date | Subject | |
---|---|---|---|
Next Message | Eric Jiang | 2016-12-09 18:58:01 | Re: Querying with multicolumn index |
Previous Message | Merlin Moncure | 2016-12-09 17:03:29 | Re: performance issue with bitmap index scans on huge amounts of big jsonb documents |