Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

From: Francesco Degrassi <francesco(dot)degrassi(at)optionfactory(dot)net>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date: 2024-09-16 19:35:13
Message-ID: CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Leader backend process hangs forever executing certain queries in
parallel mode. `pg_stat_activity` reports the leader stuck on
IPC/ParallelFinish, with no trace of parallel workers. Nothing is
reported in frontend output or server logs. Ctrl-C does not terminate
the query, requires a SIGKILL.

We originally stumbled into this while developing a foreign data
wrapper; using the SPI to perform queries during a foreign scan
operations, we got hung when the SPI query employed a parallel plan.

We then found multiple reports of leader backends stuck in
IPC/ParallelFinish on psql-hackers and psql-bugs mailing lists, see
references below. We managed to get a reliable reproducer that does not
require any extension to work.

Test system
===========

- Architecture: x86_64
- OS: Linux Mint 21.1 Vera
- Tested postgres version(s):
- latest 16 (16.4)
- latest 15 (15.8)
- master @ 4eada203a5a871f893afe3eb3e07eea5de1aa642

Steps to reproduce
==================

The repro sql script is attached as `parallel_hang_repro.sql`. If using
postgres versions before 16, please replace

SET debug_parallel_query=on;

with

SET force_parallel_mode=on;

Details
=======

Setup
-----

hang=# CREATE FUNCTION my_cmp (int4, int4)
RETURNS int LANGUAGE sql AS
$$
SELECT
CASE WHEN $1 < $2 THEN -1
WHEN $1 > $2 THEN 1
ELSE 0
END;
$$;
CREATE FUNCTION
hang=# CREATE TABLE parallel_hang (i int4);
CREATE TABLE
hang=# INSERT INTO parallel_hang
(SELECT * FROM generate_series(1, 400) gs);
INSERT 0 400
hang=# CREATE OPERATOR CLASS int4_custom_ops FOR TYPE int4 USING btree
AS
OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
OPERATOR 5 > (int4, int4), FUNCTION 1 my_cmp(int4, int4);
CREATE OPERATOR CLASS
hang=# CREATE UNIQUE INDEX parallel_hang_idx
ON parallel_hang
USING btree (i int4_custom_ops);
CREATE INDEX

Hanging query
-------------

hang=# SET debug_parallel_query=on;
SET
hang=# DELETE FROM parallel_hang WHERE 380 <= i AND i <= 420;

The query never returns. Ctrl-C does not terminate the query.

Output from `pg_stat_activity`
------------------------------

The only non-system entry (apart from the backend used for querying
`pg_stat_activity`) is the following. No parallel worker backends are
present.

See attached `pg_stat_activity.out` for full output.

postgres=# select * from pg_stat_activity ;
-[ RECORD 4
]----+-------------------------------------------------------------
datid | 5
datname | postgres
pid | 705564
leader_pid |
usesysid | 10
usename | fdegrassi
application_name | psql
client_addr | 127.0.0.1
client_hostname |
client_port | 46538
backend_start | 2024-09-16 12:10:46.282769+02
xact_start | 2024-09-16 12:12:19.93757+02
query_start | 2024-09-16 12:12:19.93757+02
state_change | 2024-09-16 12:12:19.93757+02
wait_event_type | IPC
wait_event | ParallelFinish
state | active
backend_xid |
backend_xmin | 747
query_id |
query | DELETE FROM parallel_hang WHERE 380 <= i AND i <=
420;
backend_type | client backend

gdb session
-----------

Leader backend is stuck in `WaitForParallelWorkersToFinish`. See more
details in the attached "backtrace.out"

(gdb) bt
#0 0x000071e4efd25dea in epoll_wait at ..../linux/epoll_wait.c:30
#1 0x00005ff2f2040ef6 in WaitEventSetWaitBlock at latch.c:1535
#2 0x00005ff2f2040e0d in WaitEventSetWait at latch.c:1481
#3 0x00005ff2f20401cb in WaitLatch at latch.c:513
#4 0x00005ff2f1bfd4cd in WaitForParallelWorkersToFinish at
parallel.c:863
#5 0x00005ff2f1df6a9f in ExecParallelFinish at execParallel.c:1160
#6 0x00005ff2f1e198a5 in ExecShutdownGatherWorkers at nodeGather.c:404
#7 0x00005ff2f1e198f6 in ExecShutdownGather at nodeGather.c:421
#8 0x00005ff2f1dfbbcd in ExecShutdownNode_walker at execProcnode.c:804
#9 0x00005ff2f1dfbb1a in ExecShutdownNode at execProcnode.c:775
#10 0x00005ff2f1df22fc in ExecutePlan at execMain.c:1728
#11 0x00005ff2f1defe92 in standard_ExecutorRun at execMain.c:365
#12 0x00005ff2f1defd19 in ExecutorRun at execMain.c:309
#13 0x00005ff2f1e06af2 in postquel_getnext at functions.c:897
#14 0x00005ff2f1e0716d in fmgr_sql at functions.c:1198
#15 0x00005ff2f2255376 in FunctionCall2Coll at fmgr.c:1132
#16 0x00005ff2f1bc8259 in _bt_compare at nbtsearch.c:744
#17 0x00005ff2f1bc7c81 in _bt_binsrch at nbtsearch.c:403
#18 0x00005ff2f1bc7892 in _bt_search at nbtsearch.c:153
#19 0x00005ff2f1bc8e6a in _bt_first at nbtsearch.c:1377
#20 0x00005ff2f1bc542d in btgetbitmap at nbtree.c:310
#21 0x00005ff2f1bb892a in index_getbitmap at indexam.c:724
#22 0x00005ff2f1e15df6 in MultiExecBitmapIndexScan at
nodeBitmapIndexscan.c:105
#23 0x00005ff2f1dfb6ee in MultiExecProcNode at execProcnode.c:524
#24 0x00005ff2f1e14626 in BitmapHeapNext at nodeBitmapHeapscan.c:113
#25 0x00005ff2f1dff7af in ExecScanFetch at execScan.c:132
#26 0x00005ff2f1dff854 in ExecScan at execScan.c:198
#27 0x00005ff2f1e152cd in ExecBitmapHeapScan at nodeBitmapHeapscan.c:592
#28 0x00005ff2f1dfb5da in ExecProcNodeFirst at execProcnode.c:464
#29 0x00005ff2f1e32023 in ExecProcNode at .../executor/executor.h:273
#30 0x00005ff2f1e379bd in ExecModifyTable at nodeModifyTable.c:3676
#31 0x00005ff2f1dfb5da in ExecProcNodeFirst at execProcnode.c:464
#32 0x00005ff2f1def8fe in ExecProcNode at .../executor/executor.h:273
#33 0x00005ff2f1df222b in ExecutePlan at execMain.c:1670
#34 0x00005ff2f1defe92 in standard_ExecutorRun at execMain.c:365
#35 0x00005ff2f1defd19 in ExecutorRun at execMain.c:309
#36 0x00005ff2f2078d98 in ProcessQuery at pquery.c:160
#37 0x00005ff2f207a732 in PortalRunMulti at pquery.c:1277
#38 0x00005ff2f2079cc8 in PortalRun at pquery.c:791
#39 0x00005ff2f2072cfd in exec_simple_query at postgres.c:1278
#40 0x00005ff2f2077d00 in PostgresMain at postgres.c:4701
#41 0x00005ff2f1faea08 in BackendRun at postmaster.c:4464
#42 0x00005ff2f1fae294 in BackendStartup at postmaster.c:4192
#43 0x00005ff2f1faa8c5 in ServerLoop at postmaster.c:1782
#44 0x00005ff2f1faa179 in PostmasterMain at postmaster.c:1466
#45 0x00005ff2f1e6f91e in main at main.c:198

Examining `InterruptHoldoffCount` shows that interrupts are held.

(gdb) p InterruptHoldoffCount
$1 = 1

Looking instead at the parallel worker, gdb debugging shows that it
completes successfully and terminates.

Analysis
========

The problem appears to manifest when a backend is holding an LWLock and
starting a query, and the planner chooses a parallel plan for the
latter.

Holding the LWLock means that interrupts are held
(`InterruptHoldoffCount` is > 0), causing `ProcessInterrupts()` to bail
out immediately, thus never calling `HandleParallelMessages`. The leader
then gets stuck forever inside `WaitForParallelWorkersToFinish`, even
though the parallel workers have successfully terminated, because their
messages are received but never handled.

To dig further, we re-run the query with `trace_lwlocks=on` and manually
paired acquire/release messages; this allowed us to identify the
LWLock being held by the leader at time of hang.

In the case of the repro above, the lwlock appears to be a BufferContent
lwlock acquired by `_bt_getroot()`; `_bt_first()` then calls the custom
compare function `my_cmp`, which in turn executes the SQL SELECT query
for which the planner chooses a parallel plan.

(In our original work, our custom foreign data wrapper directly held an
LWLock when running a query via SPI, which when run as parallel
triggered the same bug.)

Potential fixes
---------------

As an experiment, we modified the planner code to consider the state of
`InterruptHoldoffCount` when determining the value of
`glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
is set to false.

This ensures a sequential plan is executed if interrupts are being held
on the leader backend, and the query completes normally.

The patch is attached as `no_parallel_on_interrupts_held.patch`.

Related issues
==============

- Query stuck with wait event IPC / ParallelFinish
-
https://www.postgresql.org/message-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel\@j-davis.com
- BUG \#18586: Process (and transaction) is stuck in IPC when the DB
is under high load
-
https://www.postgresql.org/message-id/flat/18586-03e1535b1b34db81%40postgresql.org

Attachment Content-Type Size
parallel_hang_repro.sql application/octet-stream 763 bytes
no_parallel_on_interrupts_held.patch application/octet-stream 562 bytes
backtrace.out application/octet-stream 5.6 KB
pg_stat_activity.out application/octet-stream 3.8 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2024-09-16 19:51:36 Re: BUG #18620: Problem: Slow Delete Operation
Previous Message PG Bug reporting form 2024-09-16 18:54:50 BUG #18620: Problem: Slow Delete Operation