JOIN performance

From: "Dean Gibson (DB Administrator)" <postgresql3(at)ultimeth(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: JOIN performance
Date: 2004-09-20 19:24:49
Message-ID: 5.1.0.14.2.20040920121606.00a89c68@imaps.mailpen.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a view that when used, is slow:

CREATE VIEW "Data".genlic_a4avail AS
SELECT
genlic_a4.*,
last_action_date,
end_date,
canceled
FROM genlic_a4
LEFT JOIN lic_hd USING( sys_id )
WHERE status != 'A';

Here is the EXPLAIN output:

Merge Join (cost=155360.47..159965.70 rows=13063 width=75)
Merge Cond: ("outer".sys_id = "inner".sys_id)
-> Sort (cost=3912.51..3916.48 rows=1589 width=62)
Sort Key: "_GenLicGroupA4".sys_id
-> Nested Loop (cost=0.00..3828.04 rows=1589 width=62)
Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
-> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40
rows=1589 width=21)
Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12))
-> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16
width=41)
-> Sort (cost=262032.96..264249.96 rows=886799 width=72)
Sort Key: lic_hd.sys_id
-> Subquery Scan lic_hd (cost=0.00..24529.99 rows=886799 width=72)
-> Seq Scan on "_LicHD" (cost=0.00..24529.99 rows=886799
width=72)

If I change the view to this:

CREATE VIEW "Data".genlic_a4avail AS
SELECT
genlic_a4.*,
(SELECT last_action_date FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
last_action_date,
(SELECT end_date FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
end_date,
(SELECT canceled FROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS
canceled
FROM genlic_a4
WHERE status != 'A';

Then the performance is MUCH better:

Subquery Scan genlic_a4avail (cost=0.00..3828.04 rows=1589 width=62)
-> Nested Loop (cost=0.00..3828.04 rows=1589 width=62)
Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
-> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40 rows=1589
width=21)
Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12))
-> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16 width=41)
SubPlan
-> Limit (cost=0.00..3.01 rows=1 width=4)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=4)
Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8)
Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8)
-> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8)
Index Cond: (unique_system_identifier = $0)

Note that genlic_a4 is small (4519), and lic_hd is large (886799), and
lic_hd has sys_id as its PRIMARY KEY.

Is there a better way to write the LEFT JOIN so as to achieve the
performance of the second VIEW without the clumsiness of the three (SELECT
... LIMIT 1) ???

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message T E Schmitz 2004-09-20 19:50:34 Re: COUNT(*) to find records which have a certain number of
Previous Message Greg Stark 2004-09-20 18:33:33 Re: COUNT(*) to find records which have a certain number of