Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Francesco Degrassi <francesco(dot)degrassi(at)optionfactory(dot)net>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date: 2024-09-18 05:04:34
Message-ID: 20240918050434.13.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> >> The problem appears to manifest when a backend is holding an LWLock and
> >> starting a query, and the planner chooses a parallel plan for the
> >> latter.
>
> > Thanks for the detailed report and for the fix.
>
> I do not have much faith in this patch. It assumes that the
> condition "interrupts can be processed" is the same at plan time and
> execution time. For plans extracted from the plan cache, there seems
> little reason to assume that must be true. The proposed test case
> cannot trigger that (today anyway) because SQL-language functions
> don't deal in cached plans, but I suspect it could be broken with a
> test case using a plpgsql function instead.

Good point. I missed that.

> I actually think that the problem is somewhere upstream of here:
> what in the world are we doing invoking planning and execution of
> arbitrary queries in a context where interrupts are held off?
> That seems certain to fail in multiple ways besides parallel
> workers not working. I think we need to fix whatever code
> believes that that could be OK. And maybe then insert
> "Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
> and executor start.

It would be nice to never run planning or execution with interrupts held. The
concrete examples so far involve btree operator classes with non-C support
functions. How practical would it be to release buffer content locks before
running index support functions? An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor). Non-C support functions have been handy for testing if
nothing else, though. Do non-bundled modules rely on non-C support functions?

Thanks,
nm

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-09-18 05:18:48 Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Previous Message Tom Lane 2024-09-18 04:23:42 Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start