Re: Performance Optimization for Dummies 2 - the SQL

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Date: 2006-10-03 16:58:30
Message-ID: efu4se$tgj$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> explain analyze is more helpful because it prints the times.

Sorry, this runs in-line in my code, and I didn't want to slow the
already-slow program with explain analyze. I have run it outside of the code
in its own query. The new results are below.

> sample 1, couple questions:
> what is the purpose of limit 1?

I don't need to know the results, I just need to know if any data which
meets this criteria exists.

> if you break up the 'or' which checks facility and address into two
> separate queries, are the two queries total times more, less, or same
> as the large query.

They are much less; I had assumed that SQL would use lazy evaluation in this
case, not bothering to perform one half of the OR condition if the other
half But the single query is much heavier than the two seperate ones.

Carlo

>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
select
f.facility_id,
provider_practice_id
from
mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join mdx_core.facility_address as fa
on fa.facility_id = pp.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
pp.provider_id = 1411311
and f.facility_type_code != 'P'
and (
pp.facility_address_id is not null
and a.state_code = 'NY'
and '10001-2382' = a.postal_code||'%'
and a.city = 'New York'
) or (
f.default_state_code = 'NY'
and '10001-2382' like f.default_postal_code||'%'
and f.default_city = 'New York'
)
limit 1

"Limit (cost=3899.18..22561.46 rows=1 width=8) (actual
time=9410.970..9410.970 rows=0 loops=1)"
" -> Hash Join (cost=3899.18..97210.58 rows=5 width=8) (actual
time=9410.966..9410.966 rows=0 loops=1)"
" Hash Cond: ("outer".address_id = "inner".address_id)"
" Join Filter: ((("outer".provider_id = 1411311) AND
("outer".facility_type_code <> 'P'::bpchar) AND ("outer".facility_address_id
IS NOT NULL) AND (("inner".state_code)::text = 'NY'::text) AND
('10001-2382'::text = (("inner".postal_code)::text || '%' (..)"
" -> Merge Join (cost=0.00..51234.97 rows=801456 width=57) (actual
time=0.314..6690.241 rows=685198 loops=1)"
" Merge Cond: ("outer".facility_id = "inner".facility_id)"
" -> Merge Join (cost=0.00..15799.46 rows=128268 width=49)
(actual time=0.197..1637.553 rows=128268 loops=1)"
" Merge Cond: ("outer".facility_id =
"inner".facility_id)"
" -> Index Scan using facility_pkey on facility f
(cost=0.00..13247.94 rows=176864 width=41) (actual time=0.145..591.219
rows=126624 loops=1)"
" -> Index Scan using facility_address_facility_idx on
facility_address fa (cost=0.00..4245.12 rows=128268 width=8) (actual
time=0.041..384.632 rows=128268 loops=1)"
" -> Index Scan using provider_practice_facility_idx on
provider_practice pp (cost=0.00..30346.89 rows=489069 width=16) (actual
time=0.111..3031.675 rows=708714 loops=1)"
" -> Hash (cost=3650.54..3650.54 rows=99454 width=36) (actual
time=478.509..478.509 rows=99454 loops=1)"
" -> Seq Scan on address a (cost=0.00..3650.54 rows=99454
width=36) (actual time=0.033..251.203 rows=99454 loops=1)"
"Total runtime: 9412.654 ms"

----------
Sample 2:
This one includes a call to a custom function which performs lexical
comparisons
and returns a rating on the likelihood that the company names refer to the
same
facility. Replacing the code:
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp
with
1 as comp
-- to avoid the function call only shaved a fragment off the execution time,
which leads me to believe my problem is in the SQL structure itself.
----------
select
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp,
facil.*
from (
select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
facility_address_id is not null
and a.country_code = 'US'
and a.state_code = 'IL'
and '60640-5759' like a.postal_code||'%'
union select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and '60640-5759' like f.default_postal_code||'%'
) as facil
order by comp

"Sort (cost=21565.50..21567.78 rows=909 width=236) (actual
time=1622.448..1622.456 rows=12 loops=1)"
" Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3
Inc'::text, (name)::text)"
" -> Subquery Scan facil (cost=21484.47..21520.83 rows=909 width=236)
(actual time=1173.103..1622.134 rows=12 loops=1)"
" -> Unique (cost=21484.47..21509.47 rows=909 width=103) (actual
time=829.747..829.840 rows=12 loops=1)"
" -> Sort (cost=21484.47..21486.75 rows=909 width=103)
(actual time=829.744..829.761 rows=12 loops=1)"
" Sort Key: facility_id, facility_address_id, address_id,
facility_type_code, name, address, city, state_code, postal_code,
country_code"
" -> Append (cost=4645.12..21439.81 rows=909 width=103)
(actual time=146.952..829.517 rows=12 loops=1)"
" -> Nested Loop (cost=4645.12..8380.19 rows=22
width=103) (actual time=146.949..510.824 rows=12 loops=1)"
" -> Hash Join (cost=4645.12..8301.36
rows=22 width=72) (actual time=146.912..510.430 rows=12 loops=1)"
" Hash Cond: ("outer".address_id =
"inner".address_id)"
" -> Seq Scan on facility_address fa
(cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..238.228
rows=128268 loops=1)"
" Filter: (facility_address_id IS
NOT NULL)"
" -> Hash (cost=4645.08..4645.08
rows=17 width=64) (actual time=131.827..131.827 rows=3 loops=1)"
" -> Seq Scan on address a
(cost=0.00..4645.08 rows=17 width=64) (actual time=3.555..131.797 rows=3
loops=1)"
" Filter: ((country_code =
'US'::bpchar) AND ((state_code)::text = 'IL'::text) AND ('60640-5759'::text
~~ ((postal_code)::text || '%'::text)))"
" -> Index Scan using facility_pkey on
facility f (cost=0.00..3.57 rows=1 width=35) (actual time=0.021..0.023
rows=1 loops=12)"
" Index Cond: ("outer".facility_id =
f.facility_id)"
" -> Subquery Scan "*SELECT* 2"
(cost=0.00..13059.40 rows=887 width=73) (actual time=318.669..318.669 rows=0
loops=1)"
" -> Nested Loop Left Join
(cost=0.00..13050.53 rows=887 width=73) (actual time=318.664..318.664 rows=0
loops=1)"
" Filter: ("inner".facility_address_id
IS NULL)"
" -> Seq Scan on facility f
(cost=0.00..9438.13 rows=887 width=73) (actual time=4.468..318.364 rows=10
loops=1)"
" Filter: ((default_country_code
= 'US'::bpchar) AND ('60640-5759'::text ~~ ((default_postal_code)::text ||
'%'::text)))"
" -> Index Scan using
facility_address_facility_idx on facility_address fa (cost=0.00..4.05
rows=2 width=8) (actual time=0.018..0.022 rows=1 loops=10)"
" Index Cond: (fa.facility_id =
"outer".facility_id)"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message felix 2006-10-03 17:29:11 Re: Unsubscribe
Previous Message Darcy Buskermolen 2006-10-03 15:35:05 Re: Poor performance on very simple query ?