| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Parallel Seq Scan | 
| Date: | 2015-10-23 11:11:31 | 
| Message-ID: | CAA4eK1Ka54gKh00QNYjwy2vUwW55+Bgw=eZjR3D-gRVyikD4Ug@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, Oct 23, 2015 at 10:33 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> +               /*
> +                * We can't finish transaction commit or abort until all
of the
> +                * workers are dead.  This means, in particular, that
> we can't respond
> +                * to interrupts at this stage.
> +                */
> +               HOLD_INTERRUPTS();
> +               status =
> WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
> +               RESUME_INTERRUPTS();
>
> These comments are correct when this code is called from
> DestroyParallelContext(), but they're flat wrong when called from
> ReinitializeParallelDSM().  I suggest moving the comment back to
> DestroyParallelContext and following it with this:
>
> HOLD_INTERRUPTS();
> WaitForParallelWorkersToDie();
> RESUME_INTERRUPTS();
>
> Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie()
itself.
>
Changed as per suggestion.
> This hunk is a problem:
>
>                 case 'X':                               /* Terminate,
> indicating clean exit */
>                         {
> -                               pfree(pcxt->worker[i].bgwhandle);
>                                 pfree(pcxt->worker[i].error_mqh);
> -                               pcxt->worker[i].bgwhandle = NULL;
>                                 pcxt->worker[i].error_mqh = NULL;
>                                 break;
>                         }
>
> If you do that on receipt of the 'X' message, then
> DestroyParallelContext() might SIGTERM a worker that has supposedly
> exited cleanly.  That seems bad.  I think maybe the solution is to
> make DestroyParallelContext() terminate the worker only if
> pcxt->worker[i].error_mqh != NULL.
Changed as per suggestion.
>   So make error_mqh == NULL mean a
> clean loss of a worker: either we couldn't register it, or it exited
> cleanly.  And bgwhandle == NULL would mean it's actually gone.
>
I think even if error_mqh is NULL, it not guarnteed that the worker has
exited, it ensures that clean worker shutdown is either in-progress or
done.
> It makes sense to have ExecShutdownGather and
> ExecShutdownGatherWorkers, but couldn't the former call the latter
> instead of duplicating the code?
>
makes sense, so changed accordingly.
> I think ReInitialize should be capitalized as Reinitialize throughout.
>
Changed as per suggestion.
> ExecParallelReInitializeTupleQueues is almost a cut-and-paste
> duplicate of ExecParallelSetupTupleQueues.  Please refactor this to
> avoid duplication - e.g. change
> ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second
> argument bool reinit. ExecParallelReInitializeTupleQueues can just do
> ExecParallelSetupTupleQueues(pxct, true).
>
Changed as per suggestion.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
| Attachment | Content-Type | Size | 
|---|---|---|
| parallel_seqscan_partialseqscan_v23.patch | application/octet-stream | 62.2 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Victor Wagner | 2015-10-23 11:52:33 | Re: Patch (2): Implement failover on libpq connect level. | 
| Previous Message | Oskari Saarenmaa | 2015-10-23 10:48:45 | Re: [patch] extensions_path GUC |