From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Armin Schöffmann <armin(dot)schoeffmann(at)aegaeon(dot)de> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: pg_restore parallel-execution-deadlock issue |
Date: | 2016-04-08 09:24:20 |
Message-ID: | CAB7nPqQbwjMG_hW-Ff3=Sn4suvBQ4gxD-8D2_GLc3hN5o9TR0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 5, 2016 at 9:28 AM, Armin Schöffmann
<armin(dot)schoeffmann(at)aegaeon(dot)de> wrote:
> I propose the below patches to parallels.c and pg_backup_utils.c fixing deadlocks in pg_restore (windows only) if running more than 2 parallel jobs.
> This problem was reported by me earlier this year.
> http://www.postgresql.org/message-id/20160307161619.25731.78653@wrigleys.postgresql.org
Yes, I recall that... It is one of the things that I have bookmarked
on my box and that I wanted to look at at some point.. Well now's the
time.
> - Winsock's "recv(...)" called in piperead() is a blocking read by default, therefor, signalizing termEvent as used in ShutdownWorkersHard() is not enough to make worker-threads go away.
> We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO in this case.
> Otherwise, the main-thread will wait forever, if more than one additional worker is active (e.g. option -j3) and a premature EOF occurs in the input-file.
/* The workers monitor this event via checkAborting(). */
SetEvent(termEvent);
+
+ /* Disable send and receive on the given socket */
+ for (i = 0; i < pstate->numWorkers; i++)
+ shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
#endif
Looking at this code, it is indeed tricky. We cannot just close the
sockets because of the blocking call emulated in WIN32's piperead
added in parallel.c, and it is necessary to be in line with the
termination event. This really meritates a comment in the code. I
added one in the patch attached.
> Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:
> - threads created with _beginthreadex need to be exited by either a "return exitcode" or "_endthreadex(exitcode)". It might be obsolete in fire-and-forget-scenarios, but it matters in other cases.
> As of current, pg_backup_utils uses EndThread to retire additional worker-threads., which are spawned by _beginthreadex in parallel.c. The corresponding call for ExitThread would be CreateThread,
> nevertheless, _beginthreadex is the correct choice here, as we do call-out into CRT and need to retain the thread-handle for after-death synchronization with the main-thread.
> The thread-handle needs to be closed explicitly.
This is as well explained here:
https://msdn.microsoft.com/en-us/library/kdzttdcb.aspx
"endthread and _endthreadex reclaim allocated thread resources and
then call ExitThread."
#ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- ExitThread(code);
+ _endthreadex(code);
#endif
This is indeed the right thing to do per the docs if _beginthreadex
has been called to initialize it.
for (j = 0; j < pstate->numWorkers; j++)
+ {
if (pstate->parallelSlot[j].hThread == hThread)
+ {
slot = &(pstate->parallelSlot[j]);
+ CloseHandle(hThread);
+ }
+ }
OK for closing the handle here. You are missing a cast to HANDLE here
actually or this code generates a warning.
> If this is not the correct place to discuss patches, I'd be glad if somebody can notify the tool's maintainer, to take a look into it.
Here or -bugs are correct places to discuss such issues. People doing
from time to time work with Windows hang up on the two lists.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
win32-dump-parallel.patch | invalid/octet-stream | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Ullrich | 2016-04-08 10:19:08 | Lower msvc build verbosity level |
Previous Message | Masahiko Sawada | 2016-04-08 08:07:21 | Re: Support for N synchronous standby servers - take 2 |