From: | "Dean Gibson (DB Administrator)" <postgresql3(at)ultimeth(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | JOIN performance |
Date: | 2004-09-20 19:24:49 |
Message-ID: | 5.1.0.14.2.20040920121606.00a89c68@imaps.mailpen.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
I have a view that when used, is slow:
CREATE VIEW "Data".genlic_a4avail AS
SELECT
genlic_a4.*,
last_action_date,
end_date,
canceled
FROM genlic_a4
LEFT JOIN lic_hd USING( sys_id )
WHERE status != 'A';
Here is the EXPLAIN output:
Merge Join (cost=155360.47..159965.70 rows=13063 width=75)
Merge Cond: ("outer".sys_id = "inner".sys_id)
-> Sort (cost=3912.51..3916.48 rows=1589 width=62)
Sort Key: "_GenLicGroupA4".sys_id
-> Nested Loop (cost=0.00..3828.04 rows=1589 width=62)
Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
-> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40
rows=1589 width=21)
Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12))
-> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16
width=41)
-> Sort (cost=262032.96..264249.96 rows=886799 width=72)
Sort Key: lic_hd.sys_id
-> Subquery Scan lic_hd (cost=0.00..24529.99 rows=886799 width=72)
-> Seq Scan on "_LicHD" (cost=0.00..24529.99 rows=886799
width=72)
If I change the view to this:
CREATE VIEW "Data".genlic_a4avail AS
SELECT
genlic_a4.*,
(SELECT last_action_date FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
last_action_date,
(SELECT end_date FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
end_date,
(SELECT canceled FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
canceled
FROM genlic_a4
WHERE status != 'A';
Then the performance is MUCH better:
Subquery Scan genlic_a4avail (cost=0.00..3828.04 rows=1589 width=62)
-> Nested Loop (cost=0.00..3828.04 rows=1589 width=62)
Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
-> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40 rows=1589
width=21)
Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12))
-> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16 width=41)
SubPlan
-> Limit (cost=0.00..3.01 rows=1 width=4)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=4)
Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8)
Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8)
Index Cond: (unique_system_identifier = $0)
Note that genlic_a4 is small (4519), and lic_hd is large (886799), and
lic_hd has sys_id as its PRIMARY KEY.
Is there a better way to write the LEFT JOIN so as to achieve the
performance of the second VIEW without the clumsiness of the three (SELECT
... LIMIT 1) ???
From | Date | Subject | |
---|---|---|---|
Next Message | T E Schmitz | 2004-09-20 19:50:34 | Re: COUNT(*) to find records which have a certain number of |
Previous Message | Greg Stark | 2004-09-20 18:33:33 | Re: COUNT(*) to find records which have a certain number of |