Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-12-06 08:18:47
Message-ID: CA+HiwqFhkpXHAA=4NY5SqYXX08uq=nYtXcSByNZF=2MAy1UA7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 12/5/24 12:28, Amit Langote wrote:
> > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> >>> might be a way to not have this issue, or at least notice the missing
> >>> call early.
> >>>
> >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> >>> the retries internally? I mean, the extensions wouldn't need to check if
> >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> >>> it might need some new arguments, but that's more obvious.
> >>
> >> One approach could be to move some code from standard_ExecutorStart()
> >> into ExecutorStart(). Specifically, the code responsible for setting
> >> up enough state in the EState to perform ExecDoInitialPruning(), which
> >> takes locks that might invalidate the plan. If the plan does become
> >> invalid, the hook and standard_ExecutorStart() are not called.
> >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> >> plan.
> >>
> >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> >> whether in core or extension code. However, it does mean accessing the
> >> PlannedStmt earlier than InitPlan(), but the current placement of the
> >> code is not exactly set in stone.
> >
> > I tried this approach and found that it essentially disables testing
> > of this patch using the delay_execution module, which relies on the
> > ExecutorStart_hook(). The way the testing works is that the hook in
> > delay_execution.c pauses the execution of a cached plan to allow a
> > concurrent session to drop an index referenced in the plan. When
> > unpaused, execution initialization resumes by calling
> > standard_ExecutorStart(). At this point, obtaining the lock on the
> > partition whose index has been dropped invalidates the plan, which the
> > hook detects and reports. It then also reports the successful
> > re-execution of an updated plan that no longer references the dropped
> > index. Hmm.
> >
>
> It's not clear to me why the change disables this testing, and I can't
> try without a patch. Could you explain?

Sorry, PFA the delta patch for the change I described above. It
applies on top of v58 series of patches that I posted yesterday.
You'll notice that delay_execution test fails if you apply and do
check-world.

As for how the change breaks the testing, here is a before and after
of the flow of a isolation test in
src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
the session used to run a cached plan, s2 to perform concurrent DDL
that invalidates the plan):

* Before (working):

1. s2 takes advisory lock
2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
advisory lock
3. s2 drops an index referenced in the plan
4. s2 unlocks advisory lock
5. s1 locks unpruned partitions -> detects plan invalidation due to
dropped index.

* After (stops working because initial pruning and locking are done
before calling ExecutorStart_hook):

1. s2 takes advisory lock
2. s1 runs cached plan -> locks unpruned partitions -> goes to
ExecutorStart_hook to get advisory lock -> waits for advisory lock
3. s2 drops an index referenced in the plan -> waits for lock on the
unpruned partition -> deadlock!

One idea I had after sending the email yesterday is to introduce
ExecutorStartCachedPlan_hook for the advisory lock based waiting.
ExecutorStartCachedPlan() is the new function that you will find in
v58-0004 that wraps ExecutorStart() to handle plan invalidation. This
new hook would be called before ExecutorStartCachedPlan() calls
ExecutorStart(), so the original testing flow can still work.

Another idea might be to use injection points infra to introduce the
wait instead of the combination of a executor hook and advisory lock.

--
Thanks, Amit Langote

Attachment Content-Type Size
pruning-in-ExecutorStart.diff application/octet-stream 13.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-12-06 08:23:00 RE: Memory leak in WAL sender with pgoutput (v10~)
Previous Message Andrei Lepikhov 2024-12-06 08:13:00 Re: Considering fractional paths in Append node