Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Francesco Degrassi <francesco(dot)degrassi(at)optionfactory(dot)net>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date: 2024-09-19 02:53:48
Message-ID: 20240919025348.c5.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Sep 18, 2024 at 01:45:31AM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > What if we just ignored the plancache when uninterruptible? The new planner
> > check would then suffice.
>
> Only if you believe that parallel-query is the only problem,
> which is something I seriously doubt. I fear that the
> committed patch is just a band-aid over one symptom of the
> general problem that we can't permit arbitrary operations
> to be invoked from a btree comparison function. It's late
> here so I'm not sufficiently awake to think of examples,
> but I'm sure there are some.

I bet one can cause an (undetected) lwlock deadlock, but I don't know of a way
to do that "accidentally".

> However ... clearly a maliciously-coded btree support function can
> do arbitrarily bad stuff. We restrict creation of opclasses to
> superusers for exactly that reason. If our ambitions are only
> to prevent support functions from *accidentally* causing problems,
> is disabling parallel query enough? I'm still pretty uncomfortable
> about it, but it's less obviously insufficient than in the general
> case.

I'm okay stopping at blocking some known accidents. I'm not aware of an
approach that would block 100% of unwanted outcomes here without also removing
a feature folks would miss. Nice things about blocking parallel query:

- Cheap to block
- Always hangs if attempted, so blocking it doesn't take away something of value
- Query still completes, without parallelism
- Writers of opclass functions probably aren't thinking of parallel query

For what it's worth, I tried making standard_ExecutorStart() warn if
!INTERRUPTS_CAN_BE_PROCESSED(). Only this thread's new test and
004_verify_nbtree_unique reached the warning. (That's not a surprise.)

On Wed, Sep 18, 2024 at 08:59:22AM -0400, Peter Geoghegan wrote:
> The test case provided was intended to be illustrative of a problem
> that some foreign data wrapper ran into, when it used SPI.

Ideally, we'd block those or at least warn under assertions so FDW authors
don't accidentally run the executor with an LWLock held. Unlike the opclass
case, we so far don't have a valid use case for holding an LWLock there. In
other words, if the opclass use case is the only known-valid one, it would be
nice to have checks so new use cases don't creep in.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-09-19 03:30:24 Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
Previous Message Peter Geoghegan 2024-09-18 12:59:22 Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start