Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2023-08-08 14:32:07
Message-ID: CA+HiwqGFYpPRxQVUfeds1QX1U6o1QkKMYjHrjn0+0XEcgUV7+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 8, 2023 at 12:36 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Aug 3, 2023 at 4:37 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > Here's a patch set where the refactoring to move the ExecutorStart()
> > calls to be closer to GetCachedPlan() (for the call sites that use a
> > CachedPlan) is extracted into a separate patch, 0002. Its commit
> > message notes an aspect of this refactoring that I feel a bit nervous
> > about -- needing to also move the CommandCounterIncrement() call from
> > the loop in PortalRunMulti() to PortalStart() which now does
> > ExecutorStart() for the PORTAL_MULTI_QUERY case.
>
> I spent some time today reviewing 0001. Here are a few thoughts and
> notes about things that I looked at.

Thanks for taking a look at this.

> First, I wondered whether it was really adequate for ExecEndPlan() to
> just loop over estate->es_plan_nodes and call it good. Put
> differently, is it possible that we could ever have more than one
> relevant EState, say for a subplan or an EPQ execution or something,
> so that this loop wouldn't cover everything? I found nothing to make
> me think that this is a real danger.

Check.

> Second, I wondered whether the ordering of cleanup operations could be
> an issue. Right now, a node can position cleanup code before, after,
> or both before and after recursing to child nodes, whereas with this
> design change, the cleanup code will always be run before recursing to
> child nodes.

Because a node is appended to es_planstate_nodes at the end of
ExecInitNode(), child nodes get added before their parent nodes. So
the children are cleaned up first.

> Here, I think we have problems. Both ExecGather and
> ExecEndGatherMerge intentionally clean up the children before the
> parent, so that the child shutdown happens before
> ExecParallelCleanup(). Based on the comment and commit
> acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
> intentional, and you can sort of see why from looking at the stuff
> that happens in ExecParallelCleanup(). If the instrumentation data
> vanishes before the child nodes have a chance to clean things up,
> maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
> the DSA vanishes, maybe we'll crash if we try to access it. If we
> actually reach DestroyParallelContext(), we're just going to start
> killing the workers. None of that sounds like what we want.
>
> The good news, of a sort, is that I think this might be the only case
> of this sort of problem. Most nodes recurse at the end, after doing
> all the cleanup, so the behavior won't change. Moreover, even if it
> did, most cleanup operations look pretty localized -- they affect only
> the node itself, and not its children. A somewhat interesting case is
> nodes associated with subplans. Right now, because of the coding of
> ExecEndPlan, nodes associated with subplans are all cleaned up at the
> very end, after everything that's not inside of a subplan. But with
> this change, they'd get cleaned up in the order of initialization,
> which actually seems more natural, as long as it doesn't break
> anything, which I think it probably won't, since as I mention in most
> cases node cleanup looks quite localized, i.e. it doesn't care whether
> it happens before or after the cleanup of other nodes.
>
> I think something will have to be done about the parallel query stuff,
> though. I'm not sure exactly what. It is a little weird that Gather
> and Gather Merge treat starting and killing workers as a purely
> "private matter" that they can decide to handle without the executor
> overall being very much aware of it. So maybe there's a way that some
> of the cleanup logic here could be hoisted up into the general
> executor machinery, that is, first end all the nodes, and then go
> back, and end all the parallelism using, maybe, another list inside of
> the estate. However, I think that the existence of ExecShutdownNode()
> is a complication here -- we need to make sure that we don't break
> either the case where that happen before overall plan shutdown, or the
> case where it doesn't.

Given that children are closed before parent, the order of operations
in ExecEndGather[Merge] is unchanged.

> Third, a couple of minor comments on details of how you actually made
> these changes in the patch set. Personally, I would remove all of the
> "is closed separately" comments that you added. I think it's a
> violation of the general coding principle that you should make the
> code look like it's always been that way. Sure, in the immediate
> future, people might wonder why you don't need to recurse, but 5 or 10
> years from now that's just going to be clutter. Second, in the cases
> where the ExecEndNode functions end up completely empty, I would
> suggest just removing the functions entirely and making the switch
> that dispatches on the node type have a switch case that lists all the
> nodes that don't need a callback here and say /* Nothing do for these
> node types */ break;. This will save a few CPU cycles and I think it
> will be easier to read as well.

I agree with both suggestions.

> Fourth, I wonder whether we really need this patch at all. I initially
> thought we did, because if we abandon the initialization of a plan
> partway through, then we end up with a plan that is in a state that
> previously would never have occurred, and we still have to be able to
> clean it up. However, perhaps it's a difference without a distinction.
> Say we have a partial plan tree, where not all of the PlanState nodes
> ever got created. We then just call the existing version of
> ExecEndPlan() on it, with no changes. What goes wrong? Sure, we might
> call ExecEndNode() on some null pointers where in the current world
> there would always be valid pointers, but ExecEndNode() will handle
> that just fine, by doing nothing for those nodes, because it starts
> with a NULL-check.

Well, not all cleanup actions for a given node type are a recursive
call to ExecEndNode(), some are also things like this:

/*
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);

But should ExecInitNode() subroutines return the partially initialized
PlanState node or NULL on detecting invalidation? If I'm
understanding how you think this should be working correctly, I think
you mean the former, because if it were the latter, ExecInitNode()
would end up returning NULL at the top for the root and then there's
nothing to pass to ExecEndNode(), so no way to clean up to begin with.
In that case, I think we will need to adjust ExecEndNode() subroutines
to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
example. That's something Tom had said he doesn't like very much [1].

Some node types such as Append, BitmapAnd, etc. that contain a list of
subplans would need some adjustment, such as using palloc0 for
as_appendplans[], etc. so that uninitialized subplans have NULL in the
array.

There are also issues around ForeignScan, CustomScan
ExecEndNode()-time callbacks when they are partially initialized -- is
it OK to call the *EndScan callback if the *BeginScan one may not have
been called to begin with? Though, perhaps we can adjust the
ExecInitNode() subroutines for those to return NULL by opening the
relation and checking for invalidation at the beginning instead of in
the middle. That should be done for all Scan or leaf-level node
types.

Anyway, I guess, for the patch's purpose, maybe we should bite the
bullet and make those adjustments rather than change ExecEndNode() as
proposed. I can give that another try.

> Another alternative design might be to switch ExecEndNode to use
> planstate_tree_walker to walk the node tree, removing the walk from
> the node-type-specific functions as in this patch, and deleting the
> end-node functions that are no longer required altogether, as proposed
> above. I somehow feel that this would be cleaner than the status quo,
> but here again, I'm not sure we really need it. planstate_tree_walker
> would just pass over any NULL pointers that it found without doing
> anything, but the current code does that too, so while this might be
> more beautiful than what we have now, I'm not sure that there's any
> real reason to do it. The fact that, like the current patch, it would
> change the order in which nodes are cleaned up is also an issue -- the
> Gather/Gather Merge ordering issues might be easier to handle this way
> with some hack in ExecEndNode() than they are with the design you have
> now, but we'd still have to do something about them, I believe.

It might be interesting to see if introducing planstate_tree_walker()
in ExecEndNode() makes it easier to reason about ExecEndNode()
generally speaking, but I think you may be that doing so may not
really make matters easier for the partially initialized planstate
tree case.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-08-08 15:13:45 Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
Previous Message Tomas Vondra 2023-08-08 14:31:00 Re: Use of additional index columns in rows filtering