From: | Richard van den Berg <richard(dot)vandenberg(at)trust-factory(dot)com> |
---|---|
To: | John A Meinel <john(at)arbash-meinel(dot)com> |
Cc: | pgsql-perform <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: When are index scans used over seq scans? |
Date: | 2005-04-21 12:14:26 |
Message-ID: | 42679922.3040300@trust-factory.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thanks a lot John for the correct search terms. :-)
The suggestion in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00029.php to
add a constraint that checks (finishtime >= starttime) does not make a
difference for me. Still seq scans are used.
The width solution explained in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00027.php
and
http://archives.postgresql.org/pgsql-performance/2005-04/msg00116.php
does make a huge difference when selecting 1 timestamp using a BETWEEN
(2ms vs 2sec), but as soon as I put 2 timestamps in a table and try a
join, everything goes south (7.7sec). I have 10k timestamps in the
duration table. :-(
I'm getting more confused on how the planner decides to use indexes. For
example, if I try:
explain analyze select us.oid from sessions us where '2005-04-10
23:11:00' between us.starttimetrunc and us.finishtimetrunc;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using sessions_st_ft_idx2 on sessions us
(cost=0.00..18320.73 rows=4765 width=4) (actual time=0.063..2.455
rows=279 loops=1)
Index Cond: (('2005-04-10 23:11:00'::timestamp without time zone <=
finishtimetrunc) AND ('2005-04-10 23:11:00'::timestamp without time zone
>= starttimetrunc))
Total runtime: 2.616 ms
is uses the index! However, if I change the date it does not:
explain analyze select us.oid from sessions us where '2005-04-09
23:11:00' between us.starttimetrunc and us.finishtimetrunc;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sessions us (cost=0.00..68173.04 rows=41575 width=4)
(actual time=553.424..1981.695 rows=64 loops=1)
Filter: (('2005-04-09 23:11:00'::timestamp without time zone >=
starttimetrunc) AND ('2005-04-09 23:11:00'::timestamp without time zone
<= finishtimetrunc))
Total runtime: 1981.802 ms
The times in sessions go from '2005-04-04 00:00:00' to '2005-04-10
23:59:00' so both are valid times to query for, but April 10th is more
towards the end. A little experimenting shows that if I go earlier than
'2005-04-10 13:26:15' seq scans are being used. I was thinking this
timestamp would have something to do with the histogram_bounds in
pg_stats, but I cannot find a match:
starttimetrunc | {"2005-04-04 00:05:00","2005-04-04
11:49:00","2005-04-04 22:03:00","2005-04-05 10:54:00","2005-04-05
21:08:00","2005-04-06 10:28:00","2005-04-07 01:57:00","2005-04-07
15:55:00","2005-04-08 10:18:00","2005-04-08 17:12:00","2005-04-10 23:57:00"}
finishtimetrunc | {"2005-04-04 00:05:00.93","2005-04-04
11:53:00.989999","2005-04-04 22:35:00.38","2005-04-05
11:13:00.029999","2005-04-05 21:31:00.989999","2005-04-06
10:45:01","2005-04-07 02:08:08.25","2005-04-07 16:20:00.93","2005-04-08
10:25:00.409999","2005-04-08 17:15:00.949999","2005-04-11 02:08:19"}
--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a | www.trust-factory.com
2518AK The Hague | Phone: +31 70 3620684
The Netherlands | Fax : +31 70 3603009
-------------------------------------------
Have you visited our new DNA Portal?
-------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Schuchardt | 2005-04-21 12:19:06 | Re: two queries and dual cpu (perplexed) |
Previous Message | Shoaib Burq (VPAC) | 2005-04-21 11:49:53 | two queries and dual cpu (perplexed) |