Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-08-16 12:35:56
Message-ID: CA+HiwqE_rQ9pZnkXeoHdds2kgAiT7XNNHZW8gTGicBdXv0rwnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 16, 2024 at 12:35 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Aug 15, 2024 at 8:57 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > TBH, it's more of a hunch that people who are not involved in this
> > development might find the new reality, whereby the execution is not
> > racefree until ExecutorRun(), hard to reason about.
>
> I'm confused by what you mean here by "racefree". A race means
> multiple sessions are doing stuff at the same time and the result
> depends on who does what first, but the executor stuff is all
> backend-private. Heavyweight locks are not backend-private, but those
> would be taken in ExectorStart(), not ExecutorRun(), IIUC.

Sorry, yes, I meant ExecutorStart(). A backend that wants to execute
a plan tree from a CachedPlan is in a race with other backends that
might modify tables before ExecutorStart() takes the remaining locks.
That race window is bigger when it is ExecutorStart() that will take
the locks, and I don't mean in terms of timing, but in terms of the
other code that can run in between GetCachedPlan() returning a
partially valid plan and ExecutorStart() takes the remaining locks
depending on the calling module.

> > With the patch, CreateQueryDesc() and ExecutorStart() are moved to
> > PortalStart() so that QueryDescs including the PlanState trees for all
> > queries are built before any is run. Why? So that if ExecutorStart()
> > fails for any query in the list, we can simply throw out the QueryDesc
> > and the PlanState trees of the previous queries (NOT run them) and ask
> > plancache for a new CachedPlan for the list of queries. We don't have
> > a way to ask plancache.c to replan only a given query in the list.
>
> I agree that moving this from PortalRun() to PortalStart() seems like
> a bad idea, especially in view of what you write below.
>
> > * There's no longer CCI() between queries in PortalRunMulti() because
> > the snapshots in each query's QueryDesc must have been adjusted to
> > reflect the correct command counter. I've checked but can't really be
> > sure if the value in the snapshot is all anyone ever uses if they want
> > to know the current value of the command counter.
>
> I don't think anything stops somebody wanting to look at the current
> value of the command counter. I also don't think you can remove the
> CommandCounterIncrement() calls between successive queries, because
> then they won't see the effects of earlier calls. So this sounds
> broken to me.

I suppose you mean CCI between "running" (calling ExecutorRun on)
successive queries. Then the patch is indeed broken. If we're to
make that right, the number of CCIs for the multi-query portals will
have to double given the separation of ExecutorStart() and
ExecutorRun() phases.

> Also keep in mind that one of the queries could call a function which
> does something that bumps the command counter again. I'm not sure if
> that creates its own hazzard separate from the lack of CCIs, or
> whether it's just another part of that same issue. But you can't
> assume that each query's snapshot should have a command counter value
> one more than the previous query.
>
> While this all seems bad for the partially-initialized-execution-tree
> approach, I wonder if you don't have problems here with the other
> design, too. Let's say you've the multi-query case and there are 2
> queries. The first one (Q1) is SELECT mysterious_function() and the
> second one (Q2) is SELECT * FROM range_partitioned_table WHERE
> key_column = 42. What if mysterious_function() performs DDL on
> range_partitioned_table? I haven't tested this so maybe there are
> things going on here that prevent trouble, but it seems like executing
> Q1 can easily invalidate the plan for Q2. And then it seems like
> you're basically back to the same problem.

A rule (but not views AFAICS) can lead to the multi-query case (there
might be other ways). I tried the following, and, yes, the plan for
the query queued by the rule is broken by the execution of that for
the 1st query:

create table foo (a int);
create table bar (a int);
create or replace function foo_trig_func () returns trigger as $$
begin drop table bar cascade; return new.*; end; $$ language plpgsql;
create trigger foo_trig before insert on foo execute function foo_trig_func();
create rule insert_foo AS ON insert TO foo do also insert into bar
values (new.*);
set plan_cache_mode to force_generic_plan ;
prepare q as insert into foo values (1);
execute q;
NOTICE: drop cascades to rule insert_foo on table foo
ERROR: relation with OID 16418 does not exist

The ERROR comes from trying to run (actually "initialize") the cached
plan for `insert into bar values (new.*);` which is due to the rule.

Though, it doesn't have to be a cached plan for the breakage to
happen. You can see the same error without the prepared statement:

insert into foo values (1);
NOTICE: drop cascades to rule insert_foo on table foo
ERROR: relation with OID 16418 does not exist

Another example:

create or replace function foo_trig_func () returns trigger as $$
begin alter table bar add b int; return new.*; end; $$ language
plpgsql;
execute q;
ERROR: table row type and query-specified row type do not match
DETAIL: Query has too few columns.

insert into foo values (1);
ERROR: table row type and query-specified row type do not match
DETAIL: Query has too few columns.

This time the error occurs in ExecModifyTable(), so when "running" the
plan, but again the code that's throwing the error is just "lazy"
initialization of the ProjectionInfo when inserting into bar.

So it is possible for the executor to try to run a plan that has
become invalid since it was created, so...

> > > > 3. The need to add *back* the fields to store the RT indexes of
> > > > relations that are not looked at by ExecInitNode() traversal such as
> > > > root partitioned tables and non-leaf partitions.
> > >
> > > I don't remember exactly why we removed those or what the benefit was,
> > > so I'm not sure how big of a problem it is if we have to put them
> > > back.
> >
> > We removed those in commit 52ed730d511b after commit f2343653f5b2
> > removed redundant execution-time locking of non-leaf relations. So we
> > removed them because we realized that execution time locking is
> > unnecessary given that AcquireExecutorLocks() exists and now we want
> > to add them back because we'd like to get rid of
> > AcquireExecutorLocks(). :-)
>
> My bias is to believe that getting rid of AcquireExecutorLocks() is
> probably the right thing to do, but that's not a strongly-held
> position and I could be totally wrong about it. The thing is, though,
> that AcquireExecutorLocks() is fundamentally stupid, and it's hard to
> see how it can ever be any smarter. If we want to make smarter
> decisions about what to lock, it seems reasonable to me to think that
> the locking code needs to be closer to code that can evaluate
> expressions and prune partitions and stuff like that.

One perhaps crazy idea [1]:

What if we remove AcquireExecutorLocks() and move the responsibility
of taking the remaining necessary locks into the executor (those on
any inheritance children that are added during planning and thus not
accounted for by AcquirePlannerLocks()), like the patch already does,
but don't make it also check if the plan has become invalid, which it
can't do anyway unless it's from a CachedPlan. That means we instead
let the executor throw any errors that occur when trying to either
initialize the plan because of the changes that have occurred to the
objects referenced in the plan, like what is happening in the above
example. If that case is going to be rare anway, why spend energy on
checking the validity and replan, especially if that's not an easy
thing to do as we're finding out. In the above example, we could say
that it's a user error to create a rule like that, so it should not
happen in practice, but when it does, the executor seems to deal with
it correctly by refusing to execute a broken plan . Perhaps it's more
worthwhile to make the executor behave correctly in face of plan
invalidation than teach the rest of the system to deal with the
executor throwing its hands up when it runs into an invalid plan?
Again, I think this may be a crazy line of thinking but just wanted to
get it out there.

--
Thanks, Amit Langote

[1] I recall Michael Paquier mentioning something like this to me once
when I was describing this patch and thread to him.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-08-16 12:55:03 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Previous Message Robert Haas 2024-08-16 12:31:27 Re: Make query cancellation keys longer