From: | willmington(at)gmx(dot)net (R(dot) Willmington) |
---|---|
To: | pgsql-admin(at)postgresql(dot)org |
Subject: | Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3 |
Date: | 2004-11-03 08:50:57 |
Message-ID: | e6568d.0411030050.494af7ab@posting.google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Ladies and gentlemen,
I have a strange problem with index usage:
We have two identical database layouts in two different databases
in the same db server:
Postgres 7.3.2 Server
Database 1 Database 2
|- cms_contents |- cms_contents
|- cms_log |- cms_log
Nedless to say the tables have the same indices.
When running the following query, the planner
decides to use all suitable table indices in database 1, but for the
tables in database 2 it doesn't. This results in the query for
database 1 taking about 1 second, whereas for database 2 the query
takes about 4 minutes (!).
SELECT a.id, b.log_timestamp as timestamp
FROM cms_contents a, cms_log b
WHERE a.state_id = 'd'
AND a.parent_id NOT IN (
SELECT id FROM cms_contents
WHERE state_id = 'd'
AND language='de' and domain_id = 1
)
AND a.domain_id = 1
AND a.language = 'de'
AND a.id = b.content_id
AND b.log_action = 'deleteContent'
AND b.log_timestamp = (
SELECT max(log_timestamp)
FROM cms_log
WHERE content_id = a.id
AND b.domain_id = a.domain_id
AND b.language = a.language
)
ORDER BY timestamp
I have already analyzed, reindexed, vacuumed (FULL...) the tables and
both
databases with no results. Indices in database 2 exist and are not
broken, they are used for simple querys such as "select * from cms_log
where log_action ='XYZ'".
I have even restarted the postgres server to exclude any "opened -
curser" based problems.
Here is what i get when doing an expain for the big select query:
_______________________________________
Database 1 (it is obviously using the indices)
_______________________________________
Sort (cost=36615.14..36615.14 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..36615.13 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Index Scan using cms_log_log_action on cms_log b
(cost=0.00..451.79 rows=120 width=22)
Index Cond: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..213.55 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1)
AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Materialize (cost=207.58..207.58 rows=281
width=4)
-> Index Scan using cms_contents_state_id on
cms_contents (cost=0.00..207.58 rows=281
width=4)
Index Cond: (state_id = 'd'::bpchar)
Filter: (("language" = 'de'::bpchar) AND
(domain_id
= 1))
SubPlan
-> Aggregate (cost=86.80..86.80 rows=1 width=8)
-> Result (cost=0.00..86.73 rows=27 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log
(cost=0.00..86.73 rows=27 width=8)
Index Cond: (content_id = $0)
_______________________________________
Database 2 (it is obviously not using all indices)
_______________________________________
Sort (cost=337657.43..337657.43 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..337657.42 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Seq Scan on cms_log b (cost=0.00..2492.53 rows=1491
width=22)
Filter: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..172.78 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Seq Scan on cms_contents (cost=0.00..333.94
rows=2347 width=4)
Filter: ((state_id = 'd'::bpchar) AND
("language" = 'de'::bpchar) AND (domain_id = 1))
SubPlan
-> Aggregate (cost=51.95..51.95 rows=1 width=8)
-> Result (cost=0.00..51.89 rows=24 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log (cost=0.00..51.89 rows=24 width=8)
Index Cond: (content_id = $0)
_____________________________________________________________
It seems to me that the query planner has some sort of bug when
predicting the costs for the query.
If i set SET ENABLE_SEQSCAN = NO; before running the query in database
2,
this is what i get from the planner:
______________________________________________________________
Sort (cost=826278.50..826278.51 rows=1 width=36)
Sort Key: b.log_timestamp
-> Nested Loop (cost=0.00..826278.49 rows=1 width=36)
Join Filter: ("outer".log_timestamp = (subplan))
-> Index Scan using cms_log_log_action on cms_log b
(cost=0.00..3539.29 rows=1491 width=22)
Index Cond: (log_action = 'deleteContent'::character
varying)
-> Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..499.72 rows=1 width=14)
Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
Filter: ((state_id = 'd'::bpchar) AND (subplan))
SubPlan
-> Materialize (cost=493.92..493.92 rows=2347
width=4)
-> Index Scan using cms_contents_state_id on
cms_contents (cost=0.00..493.92 rows=2347 width=4)
Index Cond: (state_id = 'd'::bpchar)
Filter: (("language" = 'de'::bpchar) AND
(domain_id = 1))
SubPlan
-> Aggregate (cost=51.95..51.95 rows=1 width=8)
-> Result (cost=0.00..51.89 rows=24 width=8)
One-Time Filter: (($1 = $2) AND ($3 = $4))
-> Index Scan using cms_log_content_id on
cms_log (cost=0.00..51.89 rows=24 width=8)
Index Cond: (content_id = $0)
______________________________________________________
So the planner predicts that the costs for the select with index usage
are about 3 times higher than without index usage, whereas in fact it
is a lot faster.
How can i improve the predicion? Am i to change the query?
Here are the table layouts:
______________________________________________
cms_contents:
______________________________________________
Column | Type | Modifiers
-----------------------+------------------------+-------------------------------
id | integer | not null
domain_id | integer | not null
language | character(2) | not null
state_id | character(1) | not null
type_name | character varying(50) | not null
parent_id | integer | not null
sort_children_by | character varying(100) | not null default
'order_hint'
show_children_in_tree | bit(1) | not null default
B'1'::"bit"
is_online | bit(1) |
Indexes: pk__cms_contents primary key btree (id, domain_id,
"language"),
cms_contents_children_index btree (domain_id, parent_id,
"language", state_id, is_online),
cms_contents_state_id btree (state_id)
______________________________________________
cms_log:
______________________________________________
Column | Type | Modifiers
---------------+-----------------------------+------------------------
content_id | integer | not null
domain_id | integer | not null
language | character(2) | not null
log_timestamp | timestamp without time zone | not null default now()
log_user | character varying(50) | not null
log_action | character varying(50) | not null
Indexes: cms_log_content_id btree (content_id),
cms_log_content_id_domain_action_language btree (content_id,
domain_id, "language", log_action),
cms_log_log_action btree (log_action),
cms_log_log_timestamp btree (log_timestamp)
Here are the table statistics:
database 1:
cms_contents: 13109 entries
cms_log: 119166 entries
database 2:
cms_contents: 10436 entries
cms_log: 105922 entries
Any advice appreciated,
Kind Regards,
R. Willmington
From | Date | Subject | |
---|---|---|---|
Next Message | Allen Smith | 2004-11-03 19:39:23 | Help need to restore dropped db |
Previous Message | Werner Bohl | 2004-11-02 19:23:30 | Postgresql beta4 rpms |