From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal : Parallel Merge Join |
Date: | 2017-02-14 11:52:42 |
Message-ID: | CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 14, 2017 at 12:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Few review comments on the latest version of the patch:
> 1.
> - if (joinrel->consider_parallel && nestjoinOK &&
> - save_jointype != JOIN_UNIQUE_OUTER &&
> - bms_is_empty(joinrel->lateral_relids))
> + if (outerrel->partial_pathlist == NIL ||
> + !joinrel->consider_parallel ||
> + save_jointype == JOIN_UNIQUE_OUTER ||
> + !bms_is_empty(joinrel->lateral_relids) ||
> + jointype == JOIN_FULL ||
> + jointype == JOIN_RIGHT)
> + return;
>
>
> For the matter of consistency, I think the above checks should be in
> same order as they are in function hash_inner_and_outer(). I wonder
> why are you using jointype in above check instead of save_jointype, it
> seems to me we can use save_jointype for all the cases.
Done
>
> 2.
> + generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
> +
> jointype, extra, false,
> +
> inner_cheapest_total, merge_pathkeys,
> +
> true);
>
> IIUC, the reason of passing a 7th parameter (useallclauses) as false
> is that it can be true only for Right and Full joins and for both we
> don't generate partial merge join paths. If so, then I think it is
> better to add a comment about such an assumption, so that we don't
> forget to update this code in future if we need to useallclauses for
> something else. Another way to deal with this is that you can pass
> the value of useallclauses to consider_parallel_mergejoin() and then
> to generate_mergejoin_paths(). I feel second way is better, but if
> for some reason you don't find it appropriate then at least add a
> comment.
After fixing 3rd comments this will not be needed.
>
> 3.
> generate_mergejoin_paths()
> {
>
> The above and similar changes in generate_mergejoin_paths() doesn't
> look good and in some cases when innerpath is *parallel-unsafe*, you
> need to perform some extra computation which is not required. How
> about writing a separate function generate_partial_mergejoin_paths()?
> I think you can save some extra computation and it will look neat. I
> understand that there will be some code duplication, but not sure if
> there is any other better way.
Okay, I have done as suggested.
Apart from this, there was one small problem in an earlier version
which I have corrected in this.
+ /* Consider only parallel safe inner path */
+ if (innerpath != NULL &&
+ innerpath->parallel_safe &&
+ (cheapest_total_inner == NULL ||
+ cheapest_total_inner->parallel_safe == false ||
+ compare_path_costs(innerpath, cheapest_total_inner,
+ TOTAL_COST) < 0))
In this comparison, we were only checking if innerpath is cheaper than
the cheapest_total_inner then generate path with this new inner path
as well. Now, I have added one more check if cheapest_total_inner was
not parallel safe then also consider a path with this new inner
(provided this inner is parallel safe).
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
parallel_mergejoin_v5.patch | application/octet-stream | 13.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2017-02-14 12:00:16 | Re: Should we cacheline align PGXACT? |
Previous Message | Simon Riggs | 2017-02-14 11:52:12 | Re: Measuring replay lag |