Re: pgsql: Support Parallel Append plan nodes.

From: amul sul <sulamul(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <rhaas(at)postgresql(dot)org>, pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Support Parallel Append plan nodes.
Date: 2017-12-06 06:15:17
Message-ID: CAAJ_b97Bfn32NJB_7P0mvgFWDmo+Ago7dPbKBr08eWKyprwHsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Thanks Tom for the crash analysis, I think this is the same reason that
Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
I tried to fix this crash[2] but missed the place where I have added assert
check in the assumption that we always coming from the previous check in the
while loop.

Instead, assert check we need a similar bailout logic[2] before looping back to
first partial plan. Attached patch does the same, I've tested with
parallel_leader_participation = off setting as suggested by Andres, make check
looks good except there is some obvious regression diff.

1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com

Regards,
Amul

On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Robert Haas <rhaas(at)postgresql(dot)org> writes:
>>> Support Parallel Append plan nodes.
>
>> Buildfarm member sidewinder doesn't like this:
>
> It transpires that my machine prairiedog also shows the failure.
> I instrumented nodeAppend.c (as per attached patch) to see what
> was going on, and I get this trace on prairiedog:
>
> 2017-12-05 22:50:30.228 EST [28812] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.228 EST [28812] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.241 EST [28813] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.241 EST [28813] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.252 EST [28814] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.252 EST [28814] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.407 EST [28815] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.407 EST [28815] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.474 EST [28817] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.474 EST [28817] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520)
> 2017-12-05 22:50:38.211 EST [28158] LOG: background worker "parallel worker" (PID 28818) was terminated by signal 6: Abort trap
> 2017-12-05 22:50:38.211 EST [28158] DETAIL: Failed process was running: select round(avg(aa)), sum(aa) from a_star a3;
>
> That makes it pretty clear what's going wrong, but why don't we see it
> elsewhere?
>
> On my development machine, the same patch yields
>
> 2017-12-05 22:42:24.187 EST [843] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.187 EST [843] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [844] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [844] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [845] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [845] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.197 EST [846] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.197 EST [846] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [847] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [847] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [848] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [848] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.214 EST [849] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [849] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [852] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [852] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [853] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [853] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
>
> I conclude that the reason we don't see the crash except on the slowest
> buildfarm critters is that on most machines the leader process manages
> to finish off all the subplans before any of the workers get as far as
> doing something. This doesn't give me a warm fuzzy feeling about how
> much testing this patch has gotten.
>
> regards, tom lane
>

Attachment Content-Type Size
Parallel-Append-crash-fix.patch application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2017-12-06 08:03:19 Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Andres Freund 2017-12-06 04:17:15 Re: pgsql: Support Parallel Append plan nodes.

Browse pgsql-hackers by date

  From Date Subject
Next Message sanyam jain 2017-12-06 06:36:36 Error bundling script file with extension when event trigger commands are used
Previous Message Amit Kapila 2017-12-06 06:07:18 Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com