Re: Oversight in reparameterize_path_by_child leading to executor crash

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash
Date: 2023-08-07 13:29:14
Message-ID: CAExHW5sNd=EDGmkar5pWoibyZGLZvCFDxEhEjU-JWaGLDVyfrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 4, 2023 at 6:08 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

>
> On Thu, Aug 3, 2023 at 7:20 PM Ashutosh Bapat <
> ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
>> However, if reparameterization can *not* happen at the planning time, we
>> have chosen a plan which can not be realised meeting a dead end. So as long
>> as we can ensure that the reparameterization is possible we can delay
>> actual translations. I think it should be possible to decide whether
>> reparameterization is possible based on the type of path alone. So seems
>> doable.
>>
>
> That has been done in v2 patch. See the new added function
> path_is_reparameterizable_by_child().
>

Thanks. The patch looks good overall. I like it because of its potential to
reduce memory consumption in reparameterization. That's cake on top of
cream :)

Some comments here.

+ {
+ FLAT_COPY_PATH(new_path, path, Path);
+
+ if (path->pathtype == T_SampleScan)
+ {
+ Index scan_relid = path->parent->relid;
+ RangeTblEntry *rte;
+
+ /* it should be a base rel with a tablesample clause... */
+ Assert(scan_relid > 0);
+ rte = planner_rt_fetch(scan_relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ Assert(rte->tablesample != NULL);
+
+ ADJUST_CHILD_EXPRS(rte->tablesample, TableSampleClause *);
+ }
+ }

This change makes this function usable only after the final plan has been
selected. If some code accidently uses it earlier, it would corrupt
rte->tablesample without getting detected easily. I think we should mention
this in the function prologue and move it somewhere else. Other option is
to a.
leave rte->tablesample unchanged here with a comment as to why we aren't
changing it b. reparameterize tablesample expression in
create_samplescan_plan() if the path is parameterized by the parent. We call
reparameterize_path_by_child() in create_nestloop_plan() as this patch does
right now. I like that we are delaying reparameterization. It saves a bunch
of
memory. I haven't measured it but from my recent experiments I know it will
be
a lot.

* If the inner path is parameterized, it is parameterized by the
- * topmost parent of the outer rel, not the outer rel itself. Fix
- * that.
+ * topmost parent of the outer rel, not the outer rel itself. We will

There's something wrong with the original sentence (which was probably
written
by me back then :)). I think it should have read "If the inner path is
parameterized by the topmost parent of the outer rel instead of the outer
rel
itself, fix that.". If you agree, the new comment should change it
accordingly.

@@ -2430,6 +2430,16 @@ create_nestloop_path(PlannerInfo *root,
{
NestPath *pathnode = makeNode(NestPath);
Relids inner_req_outer = PATH_REQ_OUTER(inner_path);
+ Relids outerrelids;
+
+ /*
+ * Paths are parameterized by top-level parents, so run parameterization
+ * tests on the parent relids.
+ */
+ if (outer_path->parent->top_parent_relids)
+ outerrelids = outer_path->parent->top_parent_relids;
+ else
+ outerrelids = outer_path->parent->relids;

This looks like an existing bug. Did partitionwise join paths ended up with
extra restrict clauses without this fix? I am not able to see why this
change
is needed by rest of the changes in the patch?

Anyway, let's rename outerrelids to something else e.g. outer_param_relids
to reflect
its true meaning.

+bool
+path_is_reparameterizable_by_child(Path *path)
+{
+ switch (nodeTag(path))
+ {
+ case T_Path:
... snip ...
+ case T_GatherPath:
+ return true;
+ default:
+
+ /* We don't know how to reparameterize this path. */
+ return false;
+ }
+
+ return false;
+}

How do we make sure that any changes to reparameterize_path_by_child() are
reflected here? One way is to call this function from
reparameterize_path_by_child() the first thing and return if the path can
not
be reparameterized.

I haven't gone through each path's translation to make sure that it's
possible
to do that during create_nestloop_plan(). I will do that in my next round of
review if time permits. But AFAIR, each of the translations are possible
during
create_nestloop_plan() when it happens in this patch.

-#define ADJUST_CHILD_ATTRS(node) \
+#define ADJUST_CHILD_EXPRS(node, fieldtype) \
((node) = \
- (List *) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
- child_rel, \
- child_rel->top_parent))
+ (fieldtype) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
+ child_rel, \
+ child_rel->top_parent))
+
+#define ADJUST_CHILD_ATTRS(node) ADJUST_CHILD_EXPRS(node, List *)

IIRC, ATTRS here is taken from the function being called. I think we should
just change the macro definition, not its name. If you would like to have a
separate macro for List nodes, create ADJUST_CHILD_ATTRS_IN_LIST or some
such.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-08-07 13:29:29 Re: initial pruning in parallel append
Previous Message Robert Haas 2023-08-07 13:21:12 Re: initial pruning in parallel append