From: | Amit Langote <amitlangote09(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generic plans and "initial" pruning |
Date: | 2022-01-12 14:31:53 |
Message-ID: | CA+HiwqEAdv+1t+v6pkcH9pgsSXnBeHLchDuGRwXP-oEB4BCPmg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for taking the time to look at this.
On Wed, Jan 12, 2022 at 1:22 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 24, 2021 at 10:36 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > However, using an idea that Robert suggested to me off-list a little
> > while back, it seems possible to determine the set of partitions that
> > we can safely skip locking. The idea is to look at the "initial" or
> > "pre-execution" pruning instructions contained in a given Append or
> > MergeAppend node when AcquireExecutorLocks() is collecting the
> > relations to lock and consider relations from only those sub-nodes
> > that survive performing those instructions. I've attempted
> > implementing that idea in the attached patch.
>
> Hmm. The first question that occurs to me is whether this is fully safe.
>
> Currently, AcquireExecutorLocks calls LockRelationOid for every
> relation involved in the query. That means we will probably lock at
> least one relation on which we previously had no lock and thus
> AcceptInvalidationMessages(). That will end up marking the query as no
> longer valid and CheckCachedPlan() will realize this and tell the
> caller to replan. In the corner case where we already hold all the
> required locks, we will not accept invalidation messages at this
> point, but must have done so after acquiring the last of the locks
> required, and if that didn't mark the plan invalid, it can't be
> invalid now either. Either way, everything is fine.
>
> With the proposed patch, we might never lock some of the relations
> involved in the query. Therefore, if one of those relations has been
> modified in some way that would invalidate the plan, we will
> potentially fail to discover this, and will use the plan anyway. For
> instance, suppose there's one particular partition that has an extra
> index and the plan involves an Index Scan using that index. Now
> suppose that the scan of the partition in question is pruned, but
> meanwhile, the index has been dropped. Now we're running a plan that
> scans a nonexistent index. Admittedly, we're not running that part of
> the plan. But is that enough for this to be safe? There are things
> (like EXPLAIN or auto_explain) that we might try to do even on a part
> of the plan tree that we don't try to run. Those things might break,
> because for example we won't be able to look up the name of an index
> in the catalogs for EXPLAIN output if the index is gone.
>
> This is just a relatively simple example and I think there are
> probably a bunch of others. There are a lot of kinds of DDL that could
> be performed on a partition that gets pruned away: DROP INDEX is just
> one example. The point is that to my knowledge we have no existing
> case where we try to use a plan that might be only partly valid, so if
> we introduce one, there's some risk there. I thought for a while, too,
> about whether changes to some object in a part of the plan that we're
> not executing could break things for the rest of the plan even if we
> never do anything with the plan but execute it. I can't quite see any
> actual hazard. For example, I thought about whether we might try to
> get the tuple descriptor for the pruned-away object and get a
> different tuple descriptor than we were expecting. I think we can't,
> because (1) the pruned object has to be a partition, and tuple
> descriptors have to match throughout the partitioning hierarchy,
> except for column ordering, which currently can't be changed
> after-the-fact and (2) IIRC, the tuple descriptor is stored in the
> plan and not reconstructed at runtime and (3) if we don't end up
> opening the relation because it's pruned, then we certainly can't do
> anything with its tuple descriptor. But it might be worth giving more
> thought to the question of whether there's any other way we could be
> depending on the details of an object that ended up getting pruned.
I have pondered on the possible hazards before writing the patch,
mainly because the concerns about a previously discussed proposal were
along similar lines [1].
IIUC, you're saying the plan tree is subject to inspection by non-core
code before ExecutorStart() has initialized a PlanState tree, which
must have discarded pruned portions of the plan tree. I wouldn't
claim to have scanned *all* of the core code that could possibly
access the invalidated portions of the plan tree, but from what I have
seen, I couldn't find any site that does. An ExecutorStart_hook()
gets to do that, but from what I can see it is expected to call
standard_ExecutorStart() before doing its thing and supposedly only
looks at the PlanState tree, which must be valid. Actually, EXPLAIN
also does ExecutorStart() before starting to look at the plan (the
PlanState tree), so must not run into pruned plan tree nodes. All
that said, it does sound like wishful thinking to say that no problems
can possibly occur.
At first, I had tried to implement this such that the
Append/MergeAppend nodes are edited to record the result of initial
pruning, but it felt wrong to be munging the plan tree in plancache.c.
Or, maybe this won't be a concern if performing ExecutorStart() is
made a part of CheckCachedPlan() somehow, which would then take locks
on the relation as the PlanState tree is built capturing any plan
invalidations, instead of AcquireExecutorLocks(). That does sound like
an ambitious undertaking though.
> > Note that "initial" pruning steps are now performed twice when
> > executing generic plans: once in AcquireExecutorLocks() to find
> > partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
> > determine the set of partition sub-nodes to be initialized for
> > execution, though I wasn't able to come up with a good idea to avoid
> > this duplication.
>
> I think this is something that will need to be fixed somehow. Apart
> from the CPU cost, it's scary to imagine that the set of nodes on
> which we acquired locks might be different from the set of nodes that
> we initialize. If we do the same computation twice, there must be some
> non-zero probability of getting a different answer the second time,
> even if the circumstances under which it would actually happen are
> remote. Consider, for example, a function that is labeled IMMUTABLE
> but is really VOLATILE. Now maybe you can get the system to lock one
> set of partitions and then initialize a different set of partitions. I
> don't think we want to try to reason about what consequences that
> might have and prove that somehow it's going to be OK; I think we want
> to nail the door shut very tightly to make sure that it can't.
Yeah, the premise of the patch is that "initial" pruning steps produce
the same result both times. I assumed that would be true because the
pruning steps are not allowed to contain any VOLATILE expressions.
Regarding the possibility that IMMUTABLE labeling of functions may be
incorrect, I haven't considered if the runtime pruning code can cope
or whether it should try to. If such a case does occur in practice,
the bad outcome would be an Assert failure in
ExecGetRangeTableRelation() or using a partition unlocked in the
non-assert builds, the latter of which feels especially bad.
--
Amit Langote
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2022-01-12 15:07:47 | Re: do only critical work during single-user vacuum? |
Previous Message | Justin Pryzby | 2022-01-12 14:15:26 | Re: libpq compression (part 2) |