Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-08-20 13:00:38
Message-ID: CA+HiwqHkjicWzfAjB6_SVsVmKF6omQ4EBHr+GTUgJNN7WiUDag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 20, 2024 at 1:39 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 16, 2024 at 8:36 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > So it is possible for the executor to try to run a plan that has
> > become invalid since it was created, so...
>
> I'm not sure what the "so what" here is.

I meant that if the executor has to deal with broken plans anyway, we
might as well lean into that fact by choosing not to handle only the
cached plan case in a certain way. Yes, I understand that that's not
a good justification.

> > One perhaps crazy idea [1]:
> >
> > What if we remove AcquireExecutorLocks() and move the responsibility
> > of taking the remaining necessary locks into the executor (those on
> > any inheritance children that are added during planning and thus not
> > accounted for by AcquirePlannerLocks()), like the patch already does,
> > but don't make it also check if the plan has become invalid, which it
> > can't do anyway unless it's from a CachedPlan. That means we instead
> > let the executor throw any errors that occur when trying to either
> > initialize the plan because of the changes that have occurred to the
> > objects referenced in the plan, like what is happening in the above
> > example. If that case is going to be rare anway, why spend energy on
> > checking the validity and replan, especially if that's not an easy
> > thing to do as we're finding out. In the above example, we could say
> > that it's a user error to create a rule like that, so it should not
> > happen in practice, but when it does, the executor seems to deal with
> > it correctly by refusing to execute a broken plan . Perhaps it's more
> > worthwhile to make the executor behave correctly in face of plan
> > invalidation than teach the rest of the system to deal with the
> > executor throwing its hands up when it runs into an invalid plan?
> > Again, I think this may be a crazy line of thinking but just wanted to
> > get it out there.
>
> I don't know whether this is crazy or not. I think there are two
> issues. One, the set of checks that we have right now might not be
> complete, and we might just not have realized that because it happens
> infrequently enough that we haven't found all the bugs. If that's so,
> then a change like this could be a good thing, because it might force
> us to fix stuff we should be fixing anyway. I have a feeling that some
> of the checks you hit there were added as bug fixes long after the
> code was written originally, so my confidence that we don't have more
> bugs isn't especially high.

This makes sense.

> And two, it matters a lot how frequent the errors will be in practice.
> I think we normally try to replan rather than let a stale plan be used
> because we want to not fail, because users don't like failure. If the
> design you propose here would make failures more (or less) frequent,
> then that's a problem (or awesome).

I think we'd modify plancache.c to postpone the locking of only
prunable relations (i.e., partitions), so we're looking at only a
handful of concurrent modifications that are going to cause execution
errors. That's because we disallow many DDL modifications of
partitions unless they are done via recursion from the parent, so the
space of errors in practice would be smaller compared to if we were to
postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX
a_partion_only_index comes to mind as something that might cause an
error. I've not tested if other partition-only constraints can cause
unsafe behaviors.

Perhaps, we can add the check for CachedPlan.is_valid after every
table_open() and index_open() in the executor that takes a lock or at
all the places we discussed previously and throw the error (say:
"cached plan is no longer valid") if it's false. That's better than
running into and throwing into some random error by soldiering ahead
with its initialization / execution, but still a loss in terms of user
experience because we're adding a new failure mode, however rare.

--
Thanks, Amit Langote

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-20 13:05:54 Re: Remaining reference to _PG_fini() in ldap_password_func
Previous Message Bertrand Drouvot 2024-08-20 12:55:32 Re: Restart pg_usleep when interrupted