Cost estimates consistently too high - does it matter?

From: "Matt Clark" <matt(at)ymogen(dot)net>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Cost estimates consistently too high - does it matter?
Date: 2003-08-08 11:47:20
Message-ID: OAEAKHEHCMLBLIDGAFELMEGBDGAA.matt@ymogen.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

I've noticed that the cost estimates for a lot of my queries are
consistently far to high. Sometimes it's because the row estimates are
wrong, like this:

explain analyze select logtime from loginlog where
uid='Ymogen::YM_User::3e2c0869c2fdd26d8a74d218d5a6ff585d490560' and result =
'Success' order by logtime desc limit 3;
NOTICE: QUERY PLAN:
Limit (cost=0.00..221.85 rows=3 width=8) (actual time=0.21..2.39 rows=3
loops=1)
-> Index Scan Backward using loginlog_logtime_idx on loginlog
(cost=0.00..12846.69 rows=174 width=8) (actual time=0.20..2.37 rows=4
loops=1)
Total runtime: 2.48 msec

The row estimate here is off by a factor of 50, but the cost estimate is off
by a factor of 5000.

Sometimes the row estimates are good, but the costs are still too high:

explain analyze select u.email from ym_user u join mobilepm m on (m.ownerid
= u._id) where m.status = 'Validated' and m.network = 'TMOBILEUK';
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..2569.13 rows=441 width=145) (actual
time=1.93..248.57 rows=553 loops=1)
-> Seq Scan on mobilepm m (cost=0.00..795.11 rows=441 width=58) (actual
time=1.69..132.83 rows=553 loops=1)
-> Index Scan using ym_user_id_idx on ym_user u (cost=0.00..4.01 rows=1
width=87) (actual time=0.19..0.20 rows=1 loops=553)
Total runtime: 249.47 msec

loginlog has 180000 rows, mobilepm has 12000, ym_user has 50000, and they've
all been analyzed prior to running the query.

The server is a Quad PIII 700 Xeon/1MB cache, 3GB RAM, hardware RAID 10 on
two SCSI channels with 128MB write-back cache.

I've lowered the random_page_cost to 2 to reflect the decent disk IO, but I
suppose the fact that the DB & indexes are essentially all cached in RAM
might also be affecting the results, although effective_cache_size is set to
a realistic 262144 (2GB). Those planner params in full:

#effective_cache_size = 1000 # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
effective_cache_size = 262144 # 2GB of FS cache
random_page_cost = 2

For now the planner seems to be making the right choices, but my concern is
that at some point the planner might start making some bad decisions,
especially on more complex queries. Should I bother tweaking the planner
costs more, and if so which ones? Am I fretting over nothing?

Cheers

Matt
Matt Clark
Ymogen Ltd
matt(at)ymogen(dot)net
corp.ymogen.net

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Achilleus Mantzios 2003-08-08 11:53:51 Conditional row grained replication with DBMirror
Previous Message Randolph Jones 2003-08-08 11:42:58 Re: readline missing on linux