Quick Links

Re: Why don't use index on x when ORDER BY x, y?

From:	Robert Klemme <shortcutter(at)googlemail(dot)com>
To:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why don't use index on x when ORDER BY x, y?
Date:	2014-11-24 13:39:11
Message-ID:	CAM9pMnPN1+pLNyEGMDuE-3CeA8DRwAhKNZ4cmmte0q11w4BKyg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, Nov 24, 2014 at 12:02 PM, Vlad Arkhipov wrote:
> Hello,
>
> I wonder why Postgres does not use index in the query below? It is a quite
> common use-case when you want to sort records by an arbitrary set of
> columns but do not want to create a lot of compound indexes for all possible
> combinations of them. It seems that if, for instance, your query's ORDER BY
> is x, y, z then any of these indexes could be used to improve the
> performance: (x); (x, y); (x, y, z).
>
> create temp table t as
> select s as x, s % 10 as y, s % 100 as z
> from generate_series(1, 1000000) s;
>
> analyze t;
> create index idx1 on t (y);
>
> select *
> from t
> order by y desc, x
> limit 10;

For the record, here's the plan:

rklemme=> explain analyze
rklemme-> select *
rklemme-> from t
rklemme-> order by y desc, x
rklemme-> limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=36511.64..36511.67 rows=10 width=12) (actual
time=4058.863..4058.917 rows=10 loops=1)
-> Sort (cost=36511.64..39011.64 rows=1000000 width=12) (actual
time=4058.849..4058.868 rows=10 loops=1)
Sort Key: y, x
Sort Method: top-N heapsort Memory: 17kB
-> Seq Scan on t (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.031..2025.639 rows=1000000 loops=1)
Total runtime: 4058.992 ms
(6 rows)

Your index does not cover y AND x. If you remove "order by x" the
index will be used:

rklemme=> explain analyze
select *
from t
order by y desc
limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.86 rows=10 width=12) (actual time=0.081..0.284
rows=10 loops=1)
-> Index Scan Backward using idx1 on t (cost=0.42..43268.33
rows=1000000 width=12) (actual time=0.066..0.125 rows=10 loops=1)
Total runtime: 0.403 ms
(3 rows)

Now, with a different index:

rklemme=> create index idx2 on t (y desc, x);
CREATE INDEX
rklemme=> explain analyze
select *
from t
order by y desc, x
limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.88 rows=10 width=12) (actual time=0.127..0.290
rows=10 loops=1)
-> Index Scan using idx2 on t (cost=0.42..45514.12 rows=1000000
width=12) (actual time=0.112..0.165 rows=10 loops=1)
Total runtime: 0.404 ms
(3 rows)

rklemme=> select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on i686-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 32-bit
(1 row)

Kind regards

--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/

In response to

Why don't use index on x when ORDER BY x, y? at 2014-11-24 11:02:18 from Vlad Arkhipov

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2014-11-24 15:38:46	Re: Why don't use index on x when ORDER BY x, y?
Previous Message	Vlad Arkhipov	2014-11-24 11:02:18	Why don't use index on x when ORDER BY x, y?