From: | "Nick Fankhauser" <nickf(at)ontko(dot)com> |
---|---|
To: | "pgsql-admin" <pgsql-admin(at)postgresql(dot)org> |
Subject: | A plan returned by explain doesn't make sense to me |
Date: | 2002-04-05 13:48:58 |
Message-ID: | NEBBLAAHGLEEPCGOBHDGIEFNELAA.nickf@ontko.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi-
I'm trying to squeeze better performance out of a query, but the explain
returns a plan that really puzzles me.
A little background is in order- the table that I think is central to my
problem is what I call an "assignment table" I'm working with court data, so
I have people and cases. A person can be involved in 0 or more cases and a
case can involve 1 or more persons. To represent this, I have the
actor_case_assignment table, which is essentially three fields- a primary
key, an actor_id (person), and a case_id. I have four indexes: One for each
id, and then a *unique* index on (actor_id, case_id) because although either
id on it's own may be duplicated many times in the assignment table, no
person should ever be assigned to a case more than once,
The problem is that I have a view using this table that runs way too slow.
It looks like this:
create view criminal_detail as
select case_data.case_id,
case_data.court_id,
actor_case_assignment.actor_id,
charge.charge_count,
charge.charge_number,
charge.charge_type,
charge.charge_version,
charge.charge_class,
charge.charge_text,
charge.charge_qualifier,
charge.charge_code,
charge.charge_desc,
charge.plea_code,
charge.plea_desc,
charge.charge_date,
charge.statute_reference,
charge.charge_status_desc,
disposition_number,
cr_disposition_date,
cr_disposition_desc,
cr_disposition_code,
cr_disposition_type,
cr_disposition_mode
from (charge left outer join criminal_disposition on
(charge.charge_id = criminal_disposition.charge_id)
),
actor_case_assignment,
case_data
where actor_case_assignment.actor_case_assignment_id =
charge.actor_case_assignment_id
and actor_case_assignment.case_id = case_data.case_id
;
The explain looks like this:
develop=# explain SELECT * FROM CRIMINAL_DETAIL WHERE case_id =
'102SC01353';
NOTICE: QUERY PLAN:
Hash Join (cost=155.06..9263.84 rows=1 width=320)
-> Hash Join (cost=120.53..6034.05 rows=79880 width=260)
-> Seq Scan on charge (cost=0.00..2664.80 rows=79880 width=184)
-> Hash (cost=109.82..109.82 rows=4282 width=76)
-> Seq Scan on criminal_disposition (cost=0.00..109.82
rows=4282 width=76)
-> Hash (cost=34.53..34.53 rows=4 width=60)
-> Nested Loop (cost=0.00..34.53 rows=4 width=60)
-> Index Scan using case_data_case_id on case_data
(cost=0.00..4.01 rows=1 width=24)
-> Index Scan using actor_case_assignment_both on
actor_case_assignment (cost=0.00..30.42 rows=7 width=36)
The part that looks wrong to me is that on the last line of the explain, it
is using the index "actor_case_assignment_both" (the index on
actor_id,case_id) even though actor_id is never referenced in this query and
and index on case_id alone is available.
Looking from the outside in, this query should be very fast- an index scan
on case_id reduces the number of matches in actor_case_assignment to about
5, and then then joins these 5 rows to other tables based on indexed primary
(unique) keys. The question is, how can I re-arrange my query to help the
planner come to the same conclusion, and most puzzling, why does the planner
choose an index that involves actor_id?
Many thanks to those of you who read through all of this! Any suggestions?
-Nick
--------------------------------------------------------------------------
Nick Fankhauser nickf(at)ontko(dot)com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Nick Fankhauser | 2002-04-05 14:33:18 | error in my last post |
Previous Message | postgresql | 2002-04-05 11:43:26 | Re: connections |