Query planning question

From: "John Lister" <john(dot)lister-ps(at)kickstone(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Query planning question
Date: 2009-05-11 14:43:16
Message-ID: CDFB6E649FAB42D69FC5D4456A0F44E1@squarepi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Doing the following query

select distinct m.id, m.name
from manufacturer_manufacturer m
join product_product p on (p.manufacturer_id=m.id)
join retailer_offer o on (o.product_id=p.id)
where o.retailer_id=XXX and o.active

results in one of 2 query plans depending upon the value of XXX.
The first ignores the index on products and does a hash join which is very slow, the second uses the index and does a nested loop which is fast.

Am I right in assuming the planner thinks a sequential scan is quicker than 10k index hits, would tweaking the costs fix this or would i be better updating the stats for the product_id and manufacturer_id fields?

"Unique (cost=318308.62..321110.94 rows=1029 width=13) (actual time=5057.271..5296.973 rows=699 loops=1)"
" -> Sort (cost=318308.62..319242.73 rows=373642 width=13) (actual time=5057.270..5196.780 rows=455733 loops=1)"
" Sort Key: m.id, m.name"
" Sort Method: external merge Disk: 11032kB"
" -> Hash Join (cost=110196.74..283725.63 rows=373642 width=13) (actual time=1706.287..3451.352 rows=455733 loops=1)"
" Hash Cond: (p.manufacturer_id = m.id)"
" -> Hash Join (cost=110163.59..278554.90 rows=373642 width=4) (actual time=1705.652..3230.879 rows=455733 loops=1)"
" Hash Cond: (o.product_id = p.id)"
" -> Bitmap Heap Scan on retailer_offer o (cost=9418.68..157960.21 rows=373642 width=4) (actual time=120.277..382.208 rows=455733 loops=1)"
" Recheck Cond: ((retailer_id = 1347) AND active)"
" -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..9325.27 rows=373642 width=0) (actual time=79.503..79.503 rows=455829 loops=1)"
" Index Cond: (retailer_id = 1347)"
" -> Hash (cost=59067.07..59067.07 rows=2540307 width=8) (actual time=1584.994..1584.994 rows=2540324 loops=1)"
" -> Seq Scan on product_product p (cost=0.00..59067.07 rows=2540307 width=8) (actual time=0.008..698.313 rows=2540324 loops=1)"
" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)"
" -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.007..0.278 rows=1029 loops=1)"
"Total runtime: 5310.663 ms"

"Unique (cost=43237.52..43266.80 rows=1029 width=13) (actual time=190.978..196.625 rows=276 loops=1)"
" -> Sort (cost=43237.52..43247.28 rows=3903 width=13) (actual time=190.977..192.431 rows=11298 loops=1)"
" Sort Key: m.id, m.name"
" Sort Method: quicksort Memory: 1037kB"
" -> Hash Join (cost=134.14..43004.70 rows=3903 width=13) (actual time=5.006..155.188 rows=11298 loops=1)"
" Hash Cond: (p.manufacturer_id = m.id)"
" -> Nested Loop (cost=100.99..42917.88 rows=3903 width=4) (actual time=4.363..146.421 rows=11298 loops=1)"
" -> Bitmap Heap Scan on retailer_offer o (cost=100.99..13663.63 rows=3903 width=4) (actual time=4.345..29.871 rows=11298 loops=1)"
" Recheck Cond: ((retailer_id = 1710) AND active)"
" -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actual time=2.368..2.368 rows=11380 loops=1)"
" Index Cond: (retailer_id = 1710)"
" -> Index Scan using product_product_pkey on product_product p (cost=0.00..7.48 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=11298)"
" Index Cond: (p.id = o.product_id)"
" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.634..0.634 rows=1029 loops=1)"
" -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.275 rows=1029 loops=1)"
"Total runtime: 196.716 ms"

Thanks

--

Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2009-05-11 15:06:08 Re: Query planning question
Previous Message Alvaro Herrera 2009-05-11 14:38:26 Re: Distinct oddity