strange query plan with LIMIT

From: anthony(dot)shipman(at)symstream(dot)com
To: pgsql-performance(at)postgresql(dot)org
Subject: strange query plan with LIMIT
Date: 2011-06-07 08:02:08
Message-ID: 201106071802.08272.anthony.shipman@symstream.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Version: PostgreSQL 8.3.5 (mammoth replicator)

Schema:

CREATE TABLE tdiag (
diag_id integer DEFAULT nextval('diag_id_seq'::text),
create_time timestamp with time zone default now(), /* time this record
was created */
diag_time timestamp with time zone not null,
device_id integer, /* optional */
fleet_id integer, /* optional */
customer_id integer, /* optional */
module character varying,
node_kind smallint,
diag_level smallint,
tag character varying not null default '',
message character varying not null default '',
options text,

PRIMARY KEY (diag_id)
);

create index tdiag_create_time ON tdiag(create_time);

The number of rows is over 33 million with time stamps over the past two
weeks.

The create_time order is almost identical to the id order. What I want
to find is the first or last entry by id in a given time range. The
query I am having a problem with is:

symstream2=> explain analyze select * from tdiag where (create_time
>= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id limit 1;


QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358
rows=1 loops=1)
-> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76
rows=1141019 width=114)
(actual time=69425.352..69425.352 rows=1 loops=1)
Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
time zone) AND
(create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
Total runtime: 69425.400 ms

PG seems to decide it must scan the diag_id column and filter each row by the
create_time.

If I leave out the limit I get

symstream2=> explain analyze select * from tdiag where (create_time
>= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id;


QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=957632.43..960484.98 rows=1141019 width=114) (actual
time=552.795..656.319 rows=86530
loops=1)
Sort Key: diag_id
Sort Method: external merge Disk: 9872kB
-> Bitmap Heap Scan on tdiag (cost=25763.48..638085.13 rows=1141019
width=114) (actual
time=43.232..322.441 rows=86530 loops=1)
Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp
with time zone) AND
(create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
-> Bitmap Index Scan on tdiag_create_time (cost=0.00..25478.23
rows=1141019 width=0)
(actual time=42.574..42.574 rows=86530 loops=1)
Index Cond: ((create_time >= '2011-06-03
19:49:04+10'::timestamp with time zone) AND
(create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
Total runtime: 736.440 ms
(8 rows)

I can be explicit about the query order:

select * into tt from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time <
'2011-06-06 09:59:04.000000+0');

symstream2=> explain analyze select * from tt order by diag_id limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166
rows=1 loops=1)
-> Sort (cost=2731.95..2948.28 rows=86530 width=101) (actual
time=440.161..440.161 rows=1
loops=1)
Sort Key: diag_id
Sort Method: top-N heapsort Memory: 17kB
-> Seq Scan on tt (cost=0.00..2299.30 rows=86530 width=101) (actual
time=19.602..330.873
rows=86530 loops=1)
Total runtime: 440.209 ms
(6 rows)

But if I try using a subquery I get

symstream2=> explain analyze select * from (select * from tdiag where
(create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub
order by diag_id limit
1;


QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385
rows=1 loops=1)
-> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76
rows=1141019 width=114)
(actual time=90344.380..90344.380 rows=1 loops=1)
Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
time zone) AND
(create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
Total runtime: 90344.431 ms

How do I make this both fast and simple?
--
Anthony Shipman | Tech Support: The guys who follow the
Anthony(dot)Shipman(at)symstream(dot)com | 'Parade of New Products' with a shovel.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2011-06-07 08:19:29 Re: 8.4/9.0 simple query performance regression
Previous Message Didik Prasetyo 2011-06-07 07:47:59 i want to ask monitory peformance memory postgresql with automatically