From: | Armin Schöffmann <armin(dot)schoeffmann(at)aegaeon(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org <pgsql-hackers(at)postgresql(dot)org> |
Subject: | PATCH: pg_restore parallel-execution-deadlock issue |
Date: | 2016-04-05 00:28:45 |
Message-ID: | zarafa.570306bd.3418.074bf1420d8f2ba2@root.aegaeon.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
worthy hackers,
I propose the below patches to parallels.c and pg_backup_utils.c fixing deadlocks in pg_restore (windows only) if running more than 2 parallel jobs.
This problem was reported by me earlier this year.
http://www.postgresql.org/message-id/20160307161619.25731.78653@wrigleys.postgresql.org
- Winsock's "recv(...)" called in piperead() is a blocking read by default, therefor, signalizing termEvent as used in ShutdownWorkersHard() is not enough to make worker-threads go away.
We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO in this case.
Otherwise, the main-thread will wait forever, if more than one additional worker is active (e.g. option -j3) and a premature EOF occurs in the input-file.
Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:
- threads created with _beginthreadex need to be exited by either a "return exitcode" or "_endthreadex(exitcode)". It might be obsolete in fire-and-forget-scenarios, but it matters in other cases.
As of current, pg_backup_utils uses EndThread to retire additional worker-threads., which are spawned by _beginthreadex in parallel.c. The corresponding call for ExitThread would be CreateThread,
nevertheless, _beginthreadex is the correct choice here, as we do call-out into CRT and need to retain the thread-handle for after-death synchronization with the main-thread.
The thread-handle needs to be closed explicitly.
If this is not the correct place to discuss patches, I'd be glad if somebody can notify the tool's maintainer, to take a look into it.
best regards,
Armin Schöffmann.
--
Aegaeon technologies GmbH
phone: +49.941.8107344
fax: +49.941.8107356
Legal disclaimer:
http://aegaeon.de/disclaimer/email_all_int.txt
parallel.c
@@ -350,7 +350,8 @@ static void
ShutdownWorkersHard(ParallelState *pstate)
{
-#ifndef WIN32
+
int i;
+#ifndef WIN32
signal(SIGPIPE, SIG_IGN);
@@ -367,4 +368,7 @@ ShutdownWorkersHard(ParallelState *pstate)
/* The workers monitor this event via checkAborting(). */
SetEvent(termEvent);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
#endif
@@ -408,5 +412,8 @@ WaitForTerminatingWorkers(ParallelState *pstate)
for (j = 0; j < pstate->numWorkers; j++)
if (pstate->parallelSlot[j].hThread == hThread)
+ {
slot = &(pstate->parallelSlot[j]);
+ CloseHandle(hThread);
+ }
free(lpHandles);
pg_backup_utils.c
@@ -120,5 +120,5 @@ exit_nicely(int code)
#ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- ExitThread(code);
+ _endthreadex(code);
#endif
Attachment | Content-Type | Size |
---|---|---|
parallel.c.diff | application/octet-stream | 1.1 KB |
pg_backup_utils.c.diff | application/octet-stream | 377 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-04-05 01:00:41 | Re: Yet another small patch - reorderbuffer.c:1099 |
Previous Message | Michael Paquier | 2016-04-05 00:18:28 | Re: pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server |