Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

From: Noah Misch <noah(at)leadboat(dot)com>
To: Francesco Degrassi <francesco(dot)degrassi(at)optionfactory(dot)net>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date: 2024-09-18 03:01:59
Message-ID: 20240918030159.2a.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> The problem appears to manifest when a backend is holding an LWLock and
> starting a query, and the planner chooses a parallel plan for the
> latter.

Thanks for the detailed report and for the fix.

> Potential fixes
> ---------------
>
> As an experiment, we modified the planner code to consider the state of
> `InterruptHoldoffCount` when determining the value of
> `glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
> is set to false.
>
> This ensures a sequential plan is executed if interrupts are being held
> on the leader backend, and the query completes normally.
>
> The patch is attached as `no_parallel_on_interrupts_held.patch`.

Looks good. An alternative would be something like the leader periodically
waking up to call HandleParallelMessages() outside of ProcessInterrupts(). I
like your patch better, though. Parallel query is a lot of infrastructure to
be running while immune to statement_timeout, pg_cancel_backend(), etc. I
opted to check INTERRUPTS_CAN_BE_PROCESSED(), since QueryCancelHoldoffCount!=0
doesn't cause the hang but still qualifies as a good reason to stay out of
parallel query. Pushed that way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ac04aa8

> Related issues
> ==============
>
> - Query stuck with wait event IPC / ParallelFinish
> -
> https://www.postgresql.org/message-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel\@j-davis.com

This one didn't reproduce for me. Like your test, it involves custom code
running inside an opclass. I'm comfortable assuming it's the same problem.

> - BUG \#18586: Process (and transaction) is stuck in IPC when the DB
> is under high load
> -
> https://www.postgresql.org/message-id/flat/18586-03e1535b1b34db81%40postgresql.org

Here, I'm not seeing enough detail to judge if it's the same. That's okay.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-09-18 04:23:42 Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Previous Message Tom Lane 2024-09-18 00:16:35 Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION