From: | "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua Tolley" <eggyknap(at)gmail(dot)com> |
Cc: | "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Bryce Cutt" <pandasuit(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Date: | 2009-02-26 16:52:37 |
Message-ID: | 6EEA43D22289484890D119821101B1DF2C199B@exchange20.mercury.ad.ubc.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> From: Tom Lane
> Heikki's got a point here: the planner is aware that hashjoin doesn't
> like skewed distributions, and it assigns extra cost accordingly if it
> can determine that the join key is skewed. (See the "bucketsize"
stuff
> in cost_hashjoin.) If this patch is accepted we'll want to tweak that
> code.
Those modifications would make the optimizer more likely to select hash
join, even with skewed distributions. For the TPC-H data set that we
are using the optimizer always picks hash join over merge join (single
or multi-batch). Since the current patch does not change the cost
function, there is no change in the planning cost. It may or may not be
useful to modify the cost function depending on the effect on planning
cost.
> Still, that has little to do with the current gating issue, which is
> whether we've convinced ourselves that the patch doesn't cause a
> performance decrease for cases in which it's unable to help.
Although we have not seen an overhead when the optimization is
by-passed, we are looking at some small code changes that would
guarantee that no extra statements are executed for the single batch
case. Currently, an if optimization_on check is performed on each probe
tuple which, although minor, should be able to be avoided.
The patch's author, Bryce Cutt, is defending his Master's thesis Friday
morning (on this work), so we will provide some updated code right after
that. Since these code changes are small, they should not affect people
trying to test the performance of the current patch.
--
Ramon Lawrence
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2009-02-26 17:07:56 | Re: xpath processing brain dead |
Previous Message | Simon Riggs | 2009-02-26 16:48:59 | Re: Synchronous replication & Hot standby patches |