Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

From: "John Surcombe" <John(dot)Surcombe(at)digimap(dot)gg>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1
Date: 2011-03-12 10:07:41
Message-ID: C0B87DEF8EF4F94C9D8FBD3151B627B26C0C23@digimapserver.digimap.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

We are using PostgreSQL 9.0.3, compiled by Visual C++ build 1500,
32-bit, installed on Windows 2003 R2 32-bit.

We have an 'aisposition' table used for a GPS tracking application,
containing ~30 million rows and a number of indexes. Two of these are:

idx_receiveddatetime: indexes aisposition(receiveddatetime timestamp)

idx_userid_receiveddatetime: indexes aisposition(userid int4 desc,
receiveddatetime timestamp desc)

The problem we get is that the following query is taking many minutes to
run:

select * from aisposition where userid = 311369000 order by userid desc,
receiveddatetime desc limit 1

When we 'EXPLAIN' this query, PostgreSQL says it is using the index
idx_receiveddatetime. The way the application is designed means that in
virtually all cases the query will have to scan a very long way into
idx_receiveddatetime to find the first record where userid = 311369000.
If however we delete the idx_receiveddatetime index, the query uses the
idx_userid_receiveddatetime index, and the query only takes a few
milliseconds.

The EXPLAIN ANALYZE output with idx_receiveddatetime in place is:

Limit (cost=0.00..1.30 rows=1 width=398) (actual
time=1128097.540..1128097.541 rows=1 loops=1)

-> Index Scan Backward using idx_receiveddatetime on aisposition
(cost=0.00..2433441.05 rows=1875926 width=398) (actual
time=1128097.532..1128097.532 rows=1 loops=1)

Filter: (userid = 311369000)

Total runtime: 1128097.609 ms

And with that index deleted:

Limit (cost=0.00..4.01 rows=1 width=398) (actual time=60.633..60.634
rows=1 loops=1)

-> Index Scan using idx_userid_receiveddatetime on aisposition
(cost=0.00..7517963.47 rows=1875926 width=398) (actual
time=60.629..60.629 rows=1 loops=1)

Index Cond: (userid = 311369000)

Total runtime: 60.736 ms

We would obviously prefer PostgreSQL to use the
idx_userid_receiveddatetime index in all cases, because we know that
this will guarantee results in a timely manner, whereas using
idx_receiveddatetime will usually require a scan of much of the table
and our application will not work. What are we doing wrong?

Cheers now,

John

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marti Raudsepp 2011-03-12 17:07:29 Re: Tuning massive UPDATES and GROUP BY's?
Previous Message Kevin Grittner 2011-03-11 22:32:03 Re: ANTI-JOIN needs table, index scan not possible?