From: | Andrew Rawnsley <ronz(at)ravensfield(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | annoying query/planner choice |
Date: | 2004-01-12 03:05:11 |
Message-ID: | 22837C75-44AC-11D8-8262-000393A47FCC@ravensfield.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I have a situation that is giving me small fits, and would like to see
if anyone can shed any light on it.
I have a modest table (@1.4 million rows, and growing), that has a
variety of queries run against it. One is
a very straightforward one - pull a set of distinct rows out based on
two columns, with a simple where clause
based on one of the indexed columns. For illustration here, I've
removed the distinct and order-by clauses, as
they are not the culprits.
Before I go on - v7.4.1, currently on a test box, dual P3, 1G ram, 10K
scsi, Slackware 9 or so. The table has been
vacuumed and analyzed. Even offered pizza and beer. Production box will
be a dual Xeon with 2G ram and RAID 5.
When the query is run with a where clause that returns small number of
rows, the query uses the index and is quite speedy:
rav=# explain analyze select casno, parameter from hai.results where
site_id = 9982;
QUERY PLAN
------------------------------------------------------------------------
--------------------------------------------------------------
Index Scan using hai_res_siteid_ndx on results (cost=0.00..7720.87
rows=2394 width=30) (actual time=12.118..12.933 rows=50 loops=1)
Index Cond: (site_id = 9982)
Total runtime: 13.145 ms
When a query is run that returns a much larger set, the index is not
used, I assume because the planner thinks that a sequential scan
would work just as well with a large result set:
rav=# explain analyze select casno, parameter from hai.results where
site_id = 18;
QUERY PLAN
------------------------------------------------------------------------
----------------------------------------------
Seq Scan on results (cost=0.00..73396.39 rows=211205 width=30)
(actual time=619.020..15012.807 rows=186564 loops=1)
Filter: (site_id = 18)
Total runtime: 15279.789 ms
(3 rows)
Unfortunately, its way off:
rav=# set enable_seqscan=off;
SET
rav=# explain analyze select casno, parameter from hai.results where
site_id = 18;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------------
Index Scan using hai_res_siteid_ndx on results (cost=0.00..678587.01
rows=211205 width=30) (actual time=9.575..3569.387 rows=186564 loops=1)
Index Cond: (site_id = 18)
Total runtime: 3872.292 ms
(3 rows)
I would like, of course, for it to use the index, given that it takes
20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything
until I exceed
0.5, which strikes me as a bit high (though please correct me if I am
assuming too much...). RANDOM_PAGE_COST seems to have no effect. I
suppose I could
cluster it, but it is constantly being added to, and would have to be
re-done on a daily basis (if not more).
Any suggestions?
--------------------
Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dennis Bjorklund | 2004-01-12 03:50:25 | Re: annoying query/planner choice |
Previous Message | Greg Stark | 2004-01-12 02:55:26 | Re: Explain not accurate |