From: | Robert Klemme <shortcutter(at)googlemail(dot)com> |
---|---|
To: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Why don't use index on x when ORDER BY x, y? |
Date: | 2014-11-24 13:39:11 |
Message-ID: | CAM9pMnPN1+pLNyEGMDuE-3CeA8DRwAhKNZ4cmmte0q11w4BKyg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Nov 24, 2014 at 12:02 PM, Vlad Arkhipov wrote:
> Hello,
>
> I wonder why Postgres does not use index in the query below? It is a quite
> common use-case when you want to sort records by an arbitrary set of
> columns but do not want to create a lot of compound indexes for all possible
> combinations of them. It seems that if, for instance, your query's ORDER BY
> is x, y, z then any of these indexes could be used to improve the
> performance: (x); (x, y); (x, y, z).
>
> create temp table t as
> select s as x, s % 10 as y, s % 100 as z
> from generate_series(1, 1000000) s;
>
> analyze t;
> create index idx1 on t (y);
>
> select *
> from t
> order by y desc, x
> limit 10;
For the record, here's the plan:
rklemme=> explain analyze
rklemme-> select *
rklemme-> from t
rklemme-> order by y desc, x
rklemme-> limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=36511.64..36511.67 rows=10 width=12) (actual
time=4058.863..4058.917 rows=10 loops=1)
-> Sort (cost=36511.64..39011.64 rows=1000000 width=12) (actual
time=4058.849..4058.868 rows=10 loops=1)
Sort Key: y, x
Sort Method: top-N heapsort Memory: 17kB
-> Seq Scan on t (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.031..2025.639 rows=1000000 loops=1)
Total runtime: 4058.992 ms
(6 rows)
Your index does not cover y AND x. If you remove "order by x" the
index will be used:
rklemme=> explain analyze
select *
from t
order by y desc
limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.86 rows=10 width=12) (actual time=0.081..0.284
rows=10 loops=1)
-> Index Scan Backward using idx1 on t (cost=0.42..43268.33
rows=1000000 width=12) (actual time=0.066..0.125 rows=10 loops=1)
Total runtime: 0.403 ms
(3 rows)
Now, with a different index:
rklemme=> create index idx2 on t (y desc, x);
CREATE INDEX
rklemme=> explain analyze
select *
from t
order by y desc, x
limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.88 rows=10 width=12) (actual time=0.127..0.290
rows=10 loops=1)
-> Index Scan using idx2 on t (cost=0.42..45514.12 rows=1000000
width=12) (actual time=0.112..0.165 rows=10 loops=1)
Total runtime: 0.404 ms
(3 rows)
rklemme=> select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on i686-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 32-bit
(1 row)
Kind regards
--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-11-24 15:38:46 | Re: Why don't use index on x when ORDER BY x, y? |
Previous Message | Vlad Arkhipov | 2014-11-24 11:02:18 | Why don't use index on x when ORDER BY x, y? |