From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: costing of hash join |
Date: | 2014-01-03 22:50:55 |
Message-ID: | 27396.1388789455@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I'm trying to figure out why hash joins seem to be systematically underused
> in my hands. In the case I am immediately looking at it prefers a merge
> join with both inputs getting seq scanned and sorted, despite the hash join
> being actually 2 to 3 times faster, where inputs and intermediate working
> sets are all in memory. I normally wouldn't worry about a factor of 3
> error, but I see this a lot in many different situations. The row
> estimates are very close to actual, the errors is only in the cpu estimates.
Can you produce a test case for other people to look at?
What datatype(s) are the join keys?
> A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
> into the hash table:
Doesn't seem like monkeying with that is going to account for a 3x error.
Have you tried using perf or oprofile or similar to see where the time is
actually, rather than theoretically, going?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-01-03 22:55:24 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Previous Message | Jeff Janes | 2014-01-03 22:18:50 | costing of hash join |