same query different execution plan (hash join vs. semi-hash join)

From: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: same query different execution plan (hash join vs. semi-hash join)
Date: 2014-05-16 02:38:22
Message-ID: D83E55F5F4D99B4A9B4C4E259E6227CD01498C88@AUX1EXC01.apac.experian.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi buddies,

I've got a query as below, it runs several times with different execution plan and totally different execution time. The one using hash-join is slow and the one using semi-hash join is very fast. However, I have no control over the optimizer behavior of PostgreSQL database. Or, do I have?

The database version is 9.3.4

SELECT dem_type,
dem_value,
Count(*)
FROM demo_weekly a
WHERE date = '2013-11-30'
AND userid IN ( select userid from test1)
AND dem_type IN ( 'Gender', 'Age', 'Hobbies' )
GROUP BY dem_type,
dem_value ;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=322386.94..322786.94 rows=40000 width=29) (actual time=3142.849..3142.927 rows=19 loops=1)
-> Hash Semi Join (cost=14460.06..314403.08 rows=1064514 width=29) (actual time=803.671..2786.979 rows=1199961 loops=1)
Hash Cond: ((a.userid)::text = (test1.userid)::text)
-> Append (cost=0.00..277721.30 rows=2129027 width=78) (actual time=536.829..1691.270 rows=2102611 loops=1)
-> Seq Scan on demo_weekly a (cost=0.00..0.00 rows=1 width=808) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((date = '2013-11-30'::date) AND ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[])))
-> Bitmap Heap Scan on demo_weekly_20131130 a_1 (cost=50045.63..277721.30 rows=2129026 width=78) (actual time=536.826..1552.203 rows=2102611 loops=1)
Recheck Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))
Filter: (date = '2013-11-30'::date)
-> Bitmap Index Scan on demo_weekly_20131130_dt_idx (cost=0.00..49513.37 rows=2129026 width=0) (actual time=467.453..467.453 rows=2102611 loops=1)
Index Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))
-> Hash (cost=8938.36..8938.36 rows=441736 width=50) (actual time=266.501..266.501 rows=441736 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 35541kB
-> Seq Scan on test1 (cost=0.00..8938.36 rows=441736 width=50) (actual time=0.023..87.869 rows=441736 loops=1)
Total runtime: 3149.004 ms
(15 rows)

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=318351.90..318751.90 rows=40000 width=29) (actual time=23668.646..23668.723 rows=19 loops=1)
-> Hash Join (cost=5316.68..310497.81 rows=1047212 width=29) (actual time=1059.182..23218.864 rows=1199961 loops=1)
Hash Cond: ((a.userid)::text = (test1.userid)::text)
-> Append (cost=0.00..276382.82 rows=2094423 width=78) (actual time=528.116..2002.462 rows=2102611 loops=1)
-> Seq Scan on demo_weekly a (cost=0.00..0.00 rows=1 width=808) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date = '2013-11-30'::date) AND ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[])))
-> Bitmap Heap Scan on demo_weekly_20131130 a_1 (cost=49269.46..276382.82 rows=2094422 width=78) (actual time=528.114..1825.265 rows=2102611 loops=1)
Recheck Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))
Filter: (date = '2013-11-30'::date)
-> Bitmap Index Scan on demo_weekly_20131130_dt_idx (cost=0.00..48745.85 rows=2094422 width=0) (actual time=458.694..458.694 rows=2102611 loops=1)
Index Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))
-> Hash (cost=5314.18..5314.18 rows=200 width=516) (actual time=530.930..530.930 rows=441736 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 35541kB
-> HashAggregate (cost=5312.18..5314.18 rows=200 width=516) (actual time=298.301..411.734 rows=441736 loops=1)
-> Seq Scan on test1 (cost=0.00..5153.94 rows=63294 width=516) (actual time=0.068..91.378 rows=441736 loops=1)
Total runtime: 23679.096 ms
(16 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-05-16 02:58:02 Re: same query different execution plan (hash join vs. semi-hash join)
Previous Message Tomas Vondra 2014-05-15 23:27:21 Re: Stats collector constant I/O