From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Seq Scan |
Date: | 2015-10-23 05:03:27 |
Message-ID: | CA+TgmoYxoEFe6T_RwUv-b4S=5piqrVAWsw97aeycbh4=afXH9w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 20, 2015 at 3:04 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I have rebased the partial seq scan patch based on the above committed
> patches. Now for rescanning it reuses the dsm and to achieve that we
> need to ensure that workers have been completely shutdown and then
> reinitializes the dsm. To ensure complete shutdown of workers, current
> function WaitForParallelWorkersToFinish is not sufficient as that just
> waits for the last message to receive from worker backend, so I have
> written a new function WaitForParallelWorkersToDie. Also on receiving
> 'X' message in HandleParallelMessage, it just frees the worker handle
> without ensuring if the worker is died due to which later it will be
> difficult
> to even find whether worker is died or not, so I have removed that code
> from HandleParallelMessage. Another change is that after receiving last
> tuple in Gather node, it just shutdown down the workers without
> destroying the dsm.
+ /*
+ * We can't finish transaction commit or abort until all of the
+ * workers are dead. This means, in particular, that
we can't respond
+ * to interrupts at this stage.
+ */
+ HOLD_INTERRUPTS();
+ status =
WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
+ RESUME_INTERRUPTS();
These comments are correct when this code is called from
DestroyParallelContext(), but they're flat wrong when called from
ReinitializeParallelDSM(). I suggest moving the comment back to
DestroyParallelContext and following it with this:
HOLD_INTERRUPTS();
WaitForParallelWorkersToDie();
RESUME_INTERRUPTS();
Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie() itself.
This hunk is a problem:
case 'X': /* Terminate,
indicating clean exit */
{
- pfree(pcxt->worker[i].bgwhandle);
pfree(pcxt->worker[i].error_mqh);
- pcxt->worker[i].bgwhandle = NULL;
pcxt->worker[i].error_mqh = NULL;
break;
}
If you do that on receipt of the 'X' message, then
DestroyParallelContext() might SIGTERM a worker that has supposedly
exited cleanly. That seems bad. I think maybe the solution is to
make DestroyParallelContext() terminate the worker only if
pcxt->worker[i].error_mqh != NULL. So make error_mqh == NULL mean a
clean loss of a worker: either we couldn't register it, or it exited
cleanly. And bgwhandle == NULL would mean it's actually gone.
It makes sense to have ExecShutdownGather and
ExecShutdownGatherWorkers, but couldn't the former call the latter
instead of duplicating the code?
I think ReInitialize should be capitalized as Reinitialize throughout.
ExecParallelReInitializeTupleQueues is almost a cut-and-paste
duplicate of ExecParallelSetupTupleQueues. Please refactor this to
avoid duplication - e.g. change
ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second
argument bool reinit. ExecParallelReInitializeTupleQueues can just do
ExecParallelSetupTupleQueues(pxct, true).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-10-23 05:25:11 | Re: Parallel Seq Scan |
Previous Message | Tom Lane | 2015-10-23 04:31:29 | Re: Parallel Seq Scan |