How to get explain plan to prefer Hash Join

From: atxcanadian <matthew(dot)boyda(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: How to get explain plan to prefer Hash Join
Date: 2015-03-11 17:01:04
Message-ID: 1426093264141-5841450.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Currently seeing massive increase in performance when optimizer chooses Hash
Join over Nested Loops. I achieve this by temporarily setting nested loops
off. I'd like to setup some database variables where the optimizer prefers
hash joins. Any suggestions?

*Query in question:*
explain analyze
select dp.market_day, dp.hour_ending, dp.repeated_hour_flag,
dp.settlement_point, sum(dp.mw) dp_mw
from dp_hist_gen_actual dp
Inner Join api_settlement_points sp on
sp.settlement_point = dp.settlement_point and
sp.settlement_point_rdfid =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}' and
sp.start_date <= '2015-01-01'::date and
sp.end_date > '2015-01-01'::date and
sp.rt_model = (select case when c.rt_model_loaded = 2 then true
else false end
from cim_calendar c where c.nodal_load <=
'2015-01-01'::date
order by c.cim desc limit 1)
where dp.market_day BETWEEN '2015-01-01'::date and
'2015-01-01'::date and
dp.expiry_date is null
group by dp.market_day, dp.hour_ending, dp.repeated_hour_flag,
dp.settlement_point;

*Nested Loop Explain Analyze Output:*
HashAggregate (cost=58369.29..58369.30 rows=1 width=24) (actual
time=496287.249..496287.257 rows=24 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.145..0.145
rows=1 loops=1)
-> Sort (cost=8.30..8.78 rows=193 width=9) (actual
time=0.145..0.145 rows=1 loops=1)
Sort Key: c.cim
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193
width=9) (actual time=0.007..0.075 rows=192 loops=1)
Filter: (nodal_load <= '2015-01-01'::date)
Rows Removed by Filter: 36
-> * Nested Loop (cost=0.99..58360.98 rows=1 width=24) (actual
time=883.718..496287.058 rows=24 loops=1)*
Join Filter: ((dp.settlement_point)::text =
(sp.settlement_point)::text)
Rows Removed by Join Filter: 12312
-> Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual
dp (cost=0.56..2.78 rows=1 width=24) (actual time=0.020..20.012 rows=12336
loops=1)
Index Cond: ((market_day >= '2015-01-01'::date) AND
(market_day <= '2015-01-01'::date) AND (expiry_date IS NULL))
-> Index Scan using api_settlement_points_idx on
api_settlement_points sp (cost=0.43..58358.05 rows=12 width=9) (actual
time=39.066..40.223 rows=1 loops=12336)
Index Cond: ((rt_model = $0) AND (start_date <=
'2015-01-01'::date) AND (end_date > '2015-01-01'::date))
Filter: ((settlement_point_rdfid)::text =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text)
Rows Removed by Filter: 5298
*Total runtime: 496287.325 ms*

*Hash Join Explain Analyze Output:*
HashAggregate (cost=58369.21..58369.22 rows=1 width=24) (actual
time=50.835..50.843 rows=24 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.149..0.149
rows=1 loops=1)
-> Sort (cost=8.30..8.78 rows=193 width=9) (actual
time=0.148..0.148 rows=1 loops=1)
Sort Key: c.cim
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193
width=9) (actual time=0.009..0.082 rows=192 loops=1)
Filter: (nodal_load <= '2015-01-01'::date)
Rows Removed by Filter: 36
-> *Hash Join (cost=3.23..58360.90 rows=1 width=24) (actual
time=49.644..50.811 rows=24 loops=1)*
Hash Cond: ((sp.settlement_point)::text =
(dp.settlement_point)::text)
-> Index Scan using api_settlement_points_idx on
api_settlement_points sp (cost=0.43..58358.05 rows=12 width=9) (actual
time=39.662..40.822 rows=1 loops=1)
Index Cond: ((rt_model = $0) AND (start_date <=
'2015-01-01'::date) AND (end_date > '2015-01-01'::date))
Filter: ((settlement_point_rdfid)::text =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text)
Rows Removed by Filter: 5298
-> Hash (cost=2.78..2.78 rows=1 width=24) (actual
time=9.962..9.962 rows=12336 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 684kB
-> Index Scan using dp_hist_gen_actual_idx2 on
dp_hist_gen_actual dp (cost=0.56..2.78 rows=1 width=24) (actual
time=0.023..5.962 rows=12336 loops=1)
Index Cond: ((market_day >= '2015-01-01'::date) AND
(market_day <= '2015-01-01'::date) AND (expiry_date IS NULL))
*Total runtime: 50.906 ms*

--
View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2015-03-11 17:30:01 Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
Previous Message Tom Lane 2015-03-11 16:15:33 Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT