Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-12-09 07:10:24
Message-ID: CA+HiwqHCcSoYfpMjFshaU1bj6NjreiDvMSDpVSeBmqk-kbWrPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 6, 2024 at 5:18 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> > On 12/5/24 12:28, Amit Langote wrote:
> > > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> > >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> > >>> might be a way to not have this issue, or at least notice the missing
> > >>> call early.
> > >>>
> > >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> > >>> the retries internally? I mean, the extensions wouldn't need to check if
> > >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> > >>> it might need some new arguments, but that's more obvious.
> > >>
> > >> One approach could be to move some code from standard_ExecutorStart()
> > >> into ExecutorStart(). Specifically, the code responsible for setting
> > >> up enough state in the EState to perform ExecDoInitialPruning(), which
> > >> takes locks that might invalidate the plan. If the plan does become
> > >> invalid, the hook and standard_ExecutorStart() are not called.
> > >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> > >> plan.
> > >>
> > >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> > >> whether in core or extension code. However, it does mean accessing the
> > >> PlannedStmt earlier than InitPlan(), but the current placement of the
> > >> code is not exactly set in stone.
> > >
> > > I tried this approach and found that it essentially disables testing
> > > of this patch using the delay_execution module, which relies on the
> > > ExecutorStart_hook(). The way the testing works is that the hook in
> > > delay_execution.c pauses the execution of a cached plan to allow a
> > > concurrent session to drop an index referenced in the plan. When
> > > unpaused, execution initialization resumes by calling
> > > standard_ExecutorStart(). At this point, obtaining the lock on the
> > > partition whose index has been dropped invalidates the plan, which the
> > > hook detects and reports. It then also reports the successful
> > > re-execution of an updated plan that no longer references the dropped
> > > index. Hmm.
> > >
> >
> > It's not clear to me why the change disables this testing, and I can't
> > try without a patch. Could you explain?
>
> Sorry, PFA the delta patch for the change I described above. It
> applies on top of v58 series of patches that I posted yesterday.
> You'll notice that delay_execution test fails if you apply and do
> check-world.
>
> As for how the change breaks the testing, here is a before and after
> of the flow of a isolation test in
> src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
> the session used to run a cached plan, s2 to perform concurrent DDL
> that invalidates the plan):
>
> * Before (working):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
> advisory lock
> 3. s2 drops an index referenced in the plan
> 4. s2 unlocks advisory lock
> 5. s1 locks unpruned partitions -> detects plan invalidation due to
> dropped index.
>
> * After (stops working because initial pruning and locking are done
> before calling ExecutorStart_hook):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> locks unpruned partitions -> goes to
> ExecutorStart_hook to get advisory lock -> waits for advisory lock
> 3. s2 drops an index referenced in the plan -> waits for lock on the
> unpruned partition -> deadlock!
>
> One idea I had after sending the email yesterday is to introduce
> ExecutorStartCachedPlan_hook for the advisory lock based waiting.
> ExecutorStartCachedPlan() is the new function that you will find in
> v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
> new hook would be called before ExecutorStartCachedPlan() calls
> ExecutorStart(), so the original testing flow can still work.

Here's that patch with this idea implemented that fixes the
delay_execution test breakage. Applies on top of v58 series of
patches.

However, as mentioned in my previous reply, since extensions might
need to adjust their ExecutorStart hook code to check if the RT index
is in EState.es_unpruned_relids when accessing child relations
directly via ExecGetRangeTableRelation(), I can accept them also
adding a check for ExecPlanStillValid() in their ExecutorStart hook.
So we may not want to add a new hook even if only for testing.

--
Thanks, Amit Langote

Attachment Content-Type Size
0005-Remove-the-need-to-check-if-plan-is-valid-from-Execu.patch application/octet-stream 23.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-12-09 07:14:54 remove pgrminclude?
Previous Message Alexander Korotkov 2024-12-09 06:36:25 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands