From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Francesco Degrassi <francesco(dot)degrassi(at)optionfactory(dot)net>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start |
Date: | 2024-11-08 17:56:55 |
Message-ID: | 3793541.1731088615@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I wrote:
> Here's a proposed patch along that line. I left the test case from
> ac04aa84a alone, since it works perfectly well to test this way too.
I'd modeled that on the existing recovery code for DSM segment creation
failure, just below. But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well. To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0. That results in a bunch of unsurprising
regression test diffs, plus a couple of
+ERROR: could not find key 4 in shm TOC at 0x229f138
which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.
I'm also not happy about ReinitializeParallelWorkers'
Assert(pcxt->nworkers >= nworkers_to_launch);
The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.
So I end with the attached. There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v2-better-fix-for-noninterruptible-lockup.patch | text/x-diff | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-11-08 18:48:12 | Re: HashAgg degenerate case |
Previous Message | Noah Misch | 2024-11-08 17:31:12 | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start |