Why does adding SUM and GROUP BY destroy performance?

From: David Link <dvlink(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Why does adding SUM and GROUP BY destroy performance?
Date: 2003-09-17 17:51:36
Message-ID: 20030917175136.22623.qmail@web13504.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow.
Thanks, David Link

s1.sql:
SELECT
t.tid, t.title,
COALESCE(s0c100r100.units, 0) as w0c100r100units,
(COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100
AND s0c100r100.region = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
ORDER BY
t.title ASC
LIMIT 50
;

s2.sql:
SELECT
t.tid, t.title,
SUM(COALESCE(s0c100r100.units, 0)) as w0c100r100units,
SUM((COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0)))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100
AND s0c100r100.region = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
GROUP BY
t.tid, t.title
ORDER BY
t.title ASC
LIMIT 50
;

Times:
s1.sql takes 0m0.124s
s2.sql takes 1m1.450s

Stats:
title table: 68,000 rows
sale_200331 table: 150,000 rows
ytd_200331 table: 0 rows
rtd table: 650,000 rows

Indexes are in place.

s1 explain plan:
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..65105.51 rows=50 width=132)
-> Nested Loop (cost=0.00..91726868.54 rows=70445 width=132)
Join Filter: ("outer".screen_format = "inner"."key")
-> Nested Loop (cost=0.00..91651668.74 rows=70445 width=127)
Join Filter: ("outer".media = "inner"."key")
-> Nested Loop (cost=0.00..91578053.95 rows=70445
width=122)
-> Nested Loop (cost=0.00..91236359.89
rows=70445 width=98)
-> Nested Loop (cost=0.00..90894665.82
rows=70445 width=74)
-> Nested Loop
(cost=0.00..90539626.76 rows=70445 width=50)
-> Index Scan using
title_title_ind on title t (cost=0.00..193051.67 rows=68775 width=38)
Filter: (distributor <>
'CONTROL LABEL'::character varying)
-> Index Scan using
davids_tid_index on upc u1 (cost=0.00..1309.24 rows=353 width=12)
Index Cond: ("outer".tid =
u1.tid)
-> Index Scan using
sale_200331_upc_wk_chl_reg_ind on sale_200331 s0c100r100
(cost=0.00..5.02 rows=1 width=24)
Index Cond: (("outer".upc =
s0c100r100.upc) AND (s0c100r100.week = 200331) AND (s0c100r100.channel
= 100) AND (s0c100r100.region = 100))
-> Index Scan using
rtd_upc_year_chl_reg_ind on rtd r1c2r100 (cost=0.00..4.83 rows=1
width=24)
Index Cond: (("outer".upc =
r1c2r100.upc) AND (r1c2r100."year" = 2002) AND (r1c2r100.channel = 2)
AND (r1c2r100.region = 100))
-> Index Scan using ytd_200331_upc_wkchlreg_ind
on ytd_200331 y0c2r100 (cost=0.00..4.83 rows=1 width=24)
Index Cond: (("outer".upc = y0c2r100.upc)
AND (y0c2r100.week = 200331) AND (y0c2r100.channel = 2) AND
(y0c2r100.region = 100))
-> Seq Scan on media m (cost=0.00..1.02 rows=2
width=5)
-> Seq Scan on screen_format sf (cost=0.00..1.03 rows=3
width=5)
(21 rows)

s2 explain plan:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=403996.99..403997.11 rows=50 width=132)
-> Sort (cost=403996.99..404014.60 rows=7044 width=132)
Sort Key: t.title
-> Aggregate (cost=402393.74..403274.30 rows=7044 width=132)
-> Group (cost=402393.74..402922.08 rows=70445
width=132)
-> Sort (cost=402393.74..402569.86 rows=70445
width=132)
Sort Key: t.tid, t.title
-> Hash Join (cost=375382.76..392011.46
rows=70445 width=132)
Hash Cond: ("outer".screen_format =
"inner"."key")
-> Hash Join
(cost=375381.72..390997.78 rows=70445 width=127)
Hash Cond: ("outer".media =
"inner"."key")
-> Merge Join
(cost=375380.70..390057.49 rows=70445 width=122)
Merge Cond: ("outer".upc =
"inner".upc)
Join Filter:
(("inner".week = 200331) AND ("inner".channel = 2) AND ("inner".region
= 100))
-> Merge Join
(cost=375380.70..382782.40 rows=70445 width=98)
Merge Cond:
("outer".upc = "inner".upc)
Join Filter:
(("inner"."year" = 2002) AND ("inner".channel = 2) AND ("inner".region
= 100))
-> Sort
(cost=375310.87..375486.98 rows=70445 width=74)
Sort Key:
u1.upc
-> Nested
Loop (cost=6348.20..367282.53 rows=70445 width=74)
-> Hash
Join (cost=6348.20..12243.46 rows=70445 width=50)

Hash Cond: ("outer".tid = "inner".tid)
->
Seq Scan on upc u1 (cost=0.00..2795.28 rows=70628 width=12)
->
Hash (cost=4114.93..4114.93 rows=68775 width=38)

-> Seq Scan on title t (cost=0.00..4114.93 rows=68775 width=38)

Filter: (distributor <> 'CONTROL LABEL'::character varying)
->
Index Scan using sale_200331_upc_wk_chl_reg_ind on sale_200331
s0c100r100 (cost=0.00..5.02 rows=1 width=24)

Index Cond: (("outer".upc = s0c100r100.upc) AND (s0c100r100.week =
200331) AND (s0c100r100.channel = 100) AND (s0c100r100.region = 100))
-> Sort
(cost=69.83..72.33 rows=1000 width=24)
Sort Key:
r1c2r100.upc
-> Seq Scan
on rtd r1c2r100 (cost=0.00..20.00 rows=1000 width=24)
-> Index Scan using
ytd_200331_upc_wkchlreg_ind on ytd_200331 y0c2r100 (cost=0.00..52.00
rows=1000 width=24)
-> Hash (cost=1.02..1.02
rows=2 width=5)
-> Seq Scan on media m
(cost=0.00..1.02 rows=2 width=5)
-> Hash (cost=1.03..1.03 rows=3
width=5)
-> Seq Scan on screen_format sf
(cost=0.00..1.03 rows=3 width=5)
(36 rows)

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Darko Prenosil 2003-09-17 18:06:40 Re: psql and blob
Previous Message Kyle 2003-09-17 17:45:35 Job opportunity for PostgreSQL developer