From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | anthony(dot)shipman(at)symstream(dot)com |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: strange query plan with LIMIT |
Date: | 2011-06-07 10:43:22 |
Message-ID: | BANLkTikqL1yKaaLSc20uoSS2zO-GPG56vg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello
did you run a ANALYZE statement on table tdiag? A statistics are
absolutelly out.
Regards
Pavel Stehule
2011/6/7 <anthony(dot)shipman(at)symstream(dot)com>:
> Version: PostgreSQL 8.3.5 (mammoth replicator)
>
> Schema:
>
> CREATE TABLE tdiag (
> diag_id integer DEFAULT nextval('diag_id_seq'::text),
> create_time timestamp with time zone default now(), /* time this record
> was created */
> diag_time timestamp with time zone not null,
> device_id integer, /* optional */
> fleet_id integer, /* optional */
> customer_id integer, /* optional */
> module character varying,
> node_kind smallint,
> diag_level smallint,
> tag character varying not null default '',
> message character varying not null default '',
> options text,
>
> PRIMARY KEY (diag_id)
> );
>
> create index tdiag_create_time ON tdiag(create_time);
>
> The number of rows is around 33 million with time stamps over the past two
> weeks.
> A VACUUM ANALYZE has been done recently on the table.
>
> The create_time order is almost identical to the id order. What I want
> to find is the first or last entry by id in a given time range. The
> query I am having a problem with is:
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id limit 1;
>
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358
> rows=1 loops=1)
> -> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=69425.352..69425.352 rows=1 loops=1)
> Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> Total runtime: 69425.400 ms
>
> PG seems to decide it must scan the diag_id column and filter each row by the
> create_time.
>
>
>
> If I leave out the limit I get
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id;
>
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=957632.43..960484.98 rows=1141019 width=114) (actual
> time=552.795..656.319 rows=86530
> loops=1)
> Sort Key: diag_id
> Sort Method: external merge Disk: 9872kB
> -> Bitmap Heap Scan on tdiag (cost=25763.48..638085.13 rows=1141019
> width=114) (actual
> time=43.232..322.441 rows=86530 loops=1)
> Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp
> with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> -> Bitmap Index Scan on tdiag_create_time (cost=0.00..25478.23
> rows=1141019 width=0)
> (actual time=42.574..42.574 rows=86530 loops=1)
> Index Cond: ((create_time >= '2011-06-03
> 19:49:04+10'::timestamp with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> Total runtime: 736.440 ms
> (8 rows)
>
>
>
>
> I can be explicit about the query order:
>
> select * into tt from tdiag where (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time <
> '2011-06-06 09:59:04.000000+0');
>
> symstream2=> explain analyze select * from tt order by diag_id limit 1;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
> Limit (cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166
> rows=1 loops=1)
> -> Sort (cost=2731.95..2948.28 rows=86530 width=101) (actual
> time=440.161..440.161 rows=1
> loops=1)
> Sort Key: diag_id
> Sort Method: top-N heapsort Memory: 17kB
> -> Seq Scan on tt (cost=0.00..2299.30 rows=86530 width=101) (actual
> time=19.602..330.873
> rows=86530 loops=1)
> Total runtime: 440.209 ms
> (6 rows)
>
>
>
> But if I try using a subquery I get
>
> symstream2=> explain analyze select * from (select * from tdiag where
> (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub
> order by diag_id limit
> 1;
>
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385
> rows=1 loops=1)
> -> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=90344.380..90344.380 rows=1 loops=1)
> Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> Total runtime: 90344.431 ms
>
>
> How do I make this both fast and simple?
> --
> Anthony Shipman | flailover systems: When one goes down it
> Anthony(dot)Shipman(at)symstream(dot)com | flails about until the other goes down too.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
From | Date | Subject | |
---|---|---|---|
Next Message | tv | 2011-06-07 16:40:13 | Re: strange query plan with LIMIT |
Previous Message | Craig Ringer | 2011-06-07 09:58:46 | Re: i want to ask monitory peformance memory postgresql with automatically |