Performance Optimization for Dummies 2 - the SQL

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance Optimization for Dummies 2 - the SQL
Date: 2006-10-03 08:33:01
Message-ID: eft782$5a1$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Some very helpful people had asked that I post the troublesome code that was
generated by my import program.

I installed a SQL log feature in my import program. I have
posted samples of the SQL statements that cause the biggest delays.

Thanks for all of your help.

Carlo

----------
Sample 1:
This one is very expensive on my system.
----------
select
f.facility_id,
provider_practice_id
from
mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join mdx_core.facility_address as fa
on fa.facility_id = pp.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
pp.provider_id = 1411311
and f.facility_type_code != 'P'
and (
pp.facility_address_id is not null
and a.state_code = 'NY'
and '10001-2382' = a.postal_code||'%'
and a.city = 'New York'
) or (
f.default_state_code = 'NY'
and '10001-2382' like f.default_postal_code||'%'
and f.default_city = 'New York'
)
limit 1

Limit (cost=3899.18..32935.21 rows=1 width=8)
-> Hash Join (cost=3899.18..91007.27 rows=3 width=8)
Hash Cond: ("outer".address_id = "inner".address_id)
Join Filter: ((("outer".provider_id = 1411311) AND
("outer".facility_type_code <> 'P'::bpchar) AND ("outer".facility_address_id
IS NOT NULL) AND (("inner".state_code)::text = 'NY'::text) AND
('10001-2382'::text = (("inner".postal_code)::text || '%'::text)) AND
(("inner".city)::text = 'New York'::text)) OR (("outer".default_state_code =
'NY'::bpchar) AND ('10001-2382'::text ~~
(("outer".default_postal_code)::text || '%'::text)) AND
(("outer".default_city)::text = 'New York'::text)))
-> Merge Join (cost=0.00..50589.20 rows=695598 width=57)
Merge Cond: ("outer".facility_id = "inner".facility_id)
-> Merge Join (cost=0.00..16873.90 rows=128268 width=49)
Merge Cond: ("outer".facility_id = "inner".facility_id)
-> Index Scan using facility_pkey on facility f
(cost=0.00..13590.18 rows=162525 width=41)
-> Index Scan using facility_address_facility_idx on
facility_address fa (cost=0.00..4254.46 rows=128268 width=8)
-> Index Scan using provider_practice_facility_idx on
provider_practice pp (cost=0.00..28718.27 rows=452129 width=16)
-> Hash (cost=3650.54..3650.54 rows=99454 width=36)
-> Seq Scan on address a (cost=0.00..3650.54 rows=99454
width=36)

----------
Sample 2:
This one includes a call to a custom function which performs lexical
comparisons
and returns a rating on the likelihood that the company names refer to the
same
facility. Replacing the code:
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp
with
1 as comp
-- to avoid the function call only shaved a fragment off the execution time,
which leads me to believe my problem is in the SQL structure itself.

----------
select
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp,
facil.*
from (
select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
facility_address_id is not null
and a.country_code = 'US'
and a.state_code = 'IL'
and '60640-5759' like a.postal_code||'%'
union select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and '60640-5759' like f.default_postal_code||'%'
) as facil
order by comp

Sort (cost=20595.92..20598.01 rows=834 width=236)
Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3
Inc'::text, (name)::text)
-> Subquery Scan facil (cost=20522.10..20555.46 rows=834 width=236)
-> Unique (cost=20522.10..20545.03 rows=834 width=103)
-> Sort (cost=20522.10..20524.18 rows=834 width=103)
Sort Key: facility_id, facility_address_id, address_id,
facility_type_code, name, address, city, state_code, postal_code,
country_code
-> Append (cost=4645.12..20481.63 rows=834 width=103)
-> Nested Loop (cost=4645.12..8381.36 rows=21
width=103)
-> Hash Join (cost=4645.12..8301.35
rows=21 width=72)
Hash Cond: ("outer".address_id =
"inner".address_id)
-> Seq Scan on facility_address fa
(cost=0.00..3014.68 rows=128268 width=12)
Filter: (facility_address_id IS
NOT NULL)
-> Hash (cost=4645.08..4645.08
rows=16 width=64)
-> Seq Scan on address a
(cost=0.00..4645.08 rows=16 width=64)
Filter: ((country_code =
'US'::bpchar) AND ((state_code)::text = 'IL'::text) AND ('60640-5759'::text
~~ ((postal_code)::text || '%'::text)))
-> Index Scan using facility_pkey on
facility f (cost=0.00..3.80 rows=1 width=35)
Index Cond: ("outer".facility_id =
f.facility_id)
-> Subquery Scan "*SELECT* 2"
(cost=0.00..12100.07 rows=813 width=73)
-> Nested Loop Left Join
(cost=0.00..12091.94 rows=813 width=73)
Filter: ("inner".facility_address_id
IS NULL)
-> Seq Scan on facility f
(cost=0.00..8829.19 rows=813 width=73)
Filter: ((default_country_code =
'US'::bpchar) AND ('60640-5759'::text ~~ ((default_postal_code)::text ||
'%'::text)))
-> Index Scan using
facility_address_facility_idx on facility_address fa (cost=0.00..3.99
rows=2 width=8)
Index Cond: (fa.facility_id =
"outer".facility_id)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlo Stonebanks 2006-10-03 09:32:24 Re: Performace Optimization for Dummies
Previous Message Markus Schaber 2006-10-03 08:20:45 Re: Performace Optimization for Dummies