Re: parallel append vs. simple UNION ALL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel append vs. simple UNION ALL
Date: 2018-03-08 19:58:10
Message-ID: CA+TgmoYqh8Obik=4fL4VZpB8YRhnwXuWNZMSpGiMK5uDgLnu0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 1, 2018 at 8:30 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> The patches look clean. I particularly looked at 0003.
>
> patch 0001
> + /*
> + * Generate partial paths for final_rel, too, if outer query levels might
> + * be able to make use of them.
> + */
> I am not able to understand the construct esp. the if clause. Did you want to
> say "... if there are outer query levels. Those might ..." or something like
> that?

Well, that's what I meant, but I didn't think it was necessary to
spell it out in quite that much detail.

> 0002
> (op->all == top_union->all || op->all) &&
> This isn't really your change. Checking
> op->all is cheaper than checking equality, so may be we should check that first
> and take advantage of short-circuit condition evaluation. If we do that above
> condition reduces to (op->all || !top_union->all) which is two boolean
> conditions, even cheaper. But may be the second optimization is not worth the
> loss of readability.

I doubt this makes any difference. The compiler should be smart
enough to do this the same way regardless of exactly how we write it,
and if it's not, it can't make more than a few instructions worth of
difference. This code is not nearly performance-critical enough for
that to matter. Also, it's not the job of this patch to whack this
around.

> "identically-propertied UNIONs" may be "UNIONs with identical properties".

Likewise, I didn't write those words, so I don't plan to change them
just because I might have written them differently.

> 0003
> Probably we want to rename generate_union_path() as generate_union_rel() or
> generate_union_paths() since the function doesn't return a path anymore.
> Similarly for generate_nonunion_path().

Good point. Changed.

> In recurse_set_operations()
> - return NULL; /* keep compiler quiet */
> This line is deleted and instead rel is initialized to NULL. That way we loose
> any chance to detect a future bug because of a block leaving rel uninitialized
> through compiler warning. May be we should replace "return NULL" with "rel =
> NULL", which will not be executed because of the error.

I don't think this is really a problem. If some code path fails to
initialize rel, it's going to crash when postprocess_setop_rel() calls
set_cheapest(). Any warning the compiler gives is more likely to be a
false-positive than an actual problem.

> + /* Build path list and relid set. */
> + foreach(lc, rellist)
> + {
> With the changes in this patch, we could actually use add_paths_to_append_rel()
> to create an append path. That function builds paths with different pathkeys,
> parameterization (doesn't matter here) and also handles parallel append. So we
> can avoid code duplication and also leverage more optimizations like using
> MergeAppend instead of overall sort etc. But that function doesn't have ability
> to add a final node like make_union_unique(). A similar requirement has arisen
> in partition-wise join where we need to add a final node for finalising
> aggregate on top of paths created by add_paths_to_append_rel(). May be we can
> change that function to return a list of paths, which are then finalized by the
> caller and added to "append" rel. But I don't think doing all that is in the
> scope of this patch set.

Yeah, I thought about all of that and came to similar conclusions.

> 0004
> + if (!op->all)
> + ppath = make_union_unique(op, ppath, tlist, root);
> We could probably push the grouping/sorting down to the parallel workers. But
> again not part of this patchset.

Yep. There's a lot more work that could be done to improve setop
planning, but I think getting even this much done for v11 would be a
significant step forward.

Updated patches attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company<div class="gmail_extra"><br><div
class="gmail_quote">On Thu, Mar 1, 2018 at 8:30 AM, Ashutosh Bapat
<span dir="ltr">&lt;<a href="mailto:ashutosh(dot)bapat(at)enterprisedb(dot)com"
target="_blank">ashutosh(dot)bapat(at)enterprisedb(dot)com</a>&gt;</span>
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On
Sat, Feb 24, 2018 at 2:55 AM, Robert Haas &lt;<a
href="mailto:robertmhaas(at)gmail(dot)com">robertmhaas(at)gmail(dot)com</a>&gt;
wrote:<br>
<br>
&gt;<br>
</span><span class="">&gt; Here's an extended series of patches that
now handles both the simple<br>
&gt; UNION ALL case (where we flatten it) and the unflattened case:<br>
&gt;<br>
<br>
</span>The patches look clean. I particularly looked at 0003.<br>
<br>
patch 0001<br>
+&nbsp; &nbsp; /*<br>
+&nbsp; &nbsp; &nbsp;* Generate partial paths for final_rel, too, if
outer query levels might<br>
+&nbsp; &nbsp; &nbsp;* be able to make use of them.<br>
+&nbsp; &nbsp; &nbsp;*/<br>
I am not able to understand the construct esp. the if clause. Did you
want to<br>
say "... if there are outer query levels. Those might ..." or something like<br>
that?<br>
<br>
0002<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (op-&gt;all ==
top_union-&gt;all || op-&gt;all) &amp;&amp;<br>
This isn't really your change.&nbsp; Checking<br>
op-&gt;all is cheaper than checking equality, so may be we should
check that first<br>
and take advantage of short-circuit condition evaluation. If we do
that above<br>
condition reduces to (op-&gt;all || !top_union-&gt;all) which is two boolean<br>
conditions, even cheaper. But may be the second optimization is not
worth the<br>
loss of readability.<br>
<br>
"identically-propertied UNIONs" may be "UNIONs with identical properties".<br>
<br>
0003<br>
Probably we want to rename generate_union_path() as generate_union_rel() or<br>
generate_union_paths() since the function doesn't return a path anymore.<br>
Similarly for generate_nonunion_path().<br>
<br>
In recurse_set_operations()<br>
-&nbsp; &nbsp; &nbsp; &nbsp; return NULL;&nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; /* keep compiler quiet */<br>
This line is deleted and instead rel is initialized to NULL. That way
we loose<br>
any chance to detect a future bug because of a block leaving rel
uninitialized<br>
through compiler warning. May be we should replace "return NULL" with "rel =<br>
NULL", which will not be executed because of the error.<br>
<br>
+&nbsp; &nbsp; /* Build path list and relid set. */<br>
+&nbsp; &nbsp; foreach(lc, rellist)<br>
+&nbsp; &nbsp; {<br>
With the changes in this patch, we could actually use
add_paths_to_append_rel()<br>
to create an append path. That function builds paths with different
pathkeys,<br>
parameterization (doesn't matter here) and also handles parallel
append. So we<br>
can avoid code duplication and also leverage more optimizations like using<br>
MergeAppend instead of overall sort etc. But that function doesn't
have ability<br>
to add a final node like make_union_unique(). A similar requirement
has arisen<br>
in partition-wise join where we need to add a final node for finalising<br>
aggregate on top of paths created by add_paths_to_append_rel().&nbsp;
May be we can<br>
change that function to return a list of paths, which are then
finalized by the<br>
caller and added to "append" rel. But I don't think doing all that is in the<br>
scope of this patch set.<br>
<br>
0004<br>
+&nbsp; &nbsp; &nbsp; &nbsp; if (!op-&gt;all)<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ppath =
make_union_unique(op, ppath, tlist, root);<br>
We could probably push the grouping/sorting down to the parallel
workers. But<br>
again not part of this patchset.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Best Wishes,<br>
Ashutosh Bapat<br>
</font></span><div class="HOEnZb"><div class="h5">EnterpriseDB Corporation<br>
The Postgres Database Company<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>--
<br><div class="gmail_signature"
data-smartmail="gmail_signature">Robert Haas<br>EnterpriseDB: <a
href="http://www.enterprisedb.com"
target="_blank">http://www.enterprisedb.com</a><br>The Enterprise
PostgreSQL Company</div>
</div>

Attachment Content-Type Size
0004-Consider-Parallel-Append-as-a-way-to-implement-a-uni.patch application/octet-stream 5.0 KB
0003-Generate-a-separate-upper-relation-for-each-stage-of.patch application/octet-stream 20.9 KB
0002-Rewrite-recurse_union_children-to-iterate-rather-tha.patch application/octet-stream 5.7 KB
0001-Let-Parallel-Append-over-simple-UNION-ALL-have-parti.patch application/octet-stream 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-08 19:58:41 Re: JIT compiling with LLVM v11
Previous Message Robert Haas 2018-03-08 19:34:38 Re: parallel append vs. simple UNION ALL