Re: ORDER BY, LIMIT and indexes

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Ivan Voras <ivoras(at)freebsd(dot)org>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: ORDER BY, LIMIT and indexes
Date: 2013-08-06 22:56:31
Message-ID: 52017F1F.7020700@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 06/08/13 22:46, Ivan Voras wrote:
> Here are two more unexpected results. Same test table (1 mil. records,
> "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
> before the experiments):
>
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
> -> Seq Scan on lt (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
> Filter: (id > 900000)
> Total runtime: 142.735 ms
> (4 rows)
>
> Note the Seq Scan.
>
> ivoras=# explain analyze select * from lt where id > 900000;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on lt (cost=1683.97..7856.35 rows=101630 width=9)
> (actual time=38.462..85.780 rows=100000 loops=1)
> Recheck Cond: (id > 900000)
> -> Bitmap Index Scan on lt_pkey (cost=0.00..1658.56 rows=101630
> width=0) (actual time=38.310..38.310 rows=100000 loops=1)
> Index Cond: (id > 900000)
> Total runtime: 115.674 ms
> (5 rows)
>
> This somewhat explains the above case - we are simply fetching 100,000
> records here, and it's slow enough even with the index scan, so
> planner skips the index in the former case. BUT, if it did use the
> index, it would have been expectedly fast:
>
> ivoras=# set enable_seqscan to off;
> SET
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
> rows=10 loops=1)
> -> Index Scan using lt_pkey on lt (cost=0.00..17644.17
> rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
> Index Cond: (id > 900000)
> Total runtime: 0.175 ms
> (4 rows)
>
> It looks like the problem is in the difference between what the
> planner expects and what the Filter or Index operations deliver:
> (cost=0.00..17402.00 rows=101630 width=9) (actual
> time=142.665..142.672 rows=10 loops=1).
>
>

Hmm - I wonder if the lack or ORDER BY is part of the problem here.
Consider a similar query on pgbench_accounts:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 limit 20;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464
rows=20 loops=1)
-> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
width=4) (actual time=0.005..0.463 rows=20 loops=1)
Filter: (aid > 100000)
Total runtime: 0.474 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
rows=20 loops=1)
Index Cond: (aid > 10000000)
Total runtime: 0.030 ms
(4 rows)

So at some point you get index scans. Now add an ORDER BY:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 order by aid limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
--
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012
rows=20 loops=1)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
rows=20 loops=1
)
Index Cond: (aid > 100000)
Total runtime: 0.023 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 order by aid limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
rows=20 loops=1)
Index Cond: (aid > 10000000)
Total runtime: 0.029 ms
(4 rows)

...and we have index scans for both cases.

Cheers

Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2013-08-06 23:03:59 Re: ORDER BY, LIMIT and indexes
Previous Message Scott Marlowe 2013-08-06 22:12:18 Re: subselect requires offset 0 for good performance.