From: | amul sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <rhaas(at)postgresql(dot)org>, pgsql-committers(at)postgresql(dot)org |
Subject: | Re: pgsql: Support Parallel Append plan nodes. |
Date: | 2017-12-06 06:15:17 |
Message-ID: | CAAJ_b97Bfn32NJB_7P0mvgFWDmo+Ago7dPbKBr08eWKyprwHsA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Thanks Tom for the crash analysis, I think this is the same reason that
Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
I tried to fix this crash[2] but missed the place where I have added assert
check in the assumption that we always coming from the previous check in the
while loop.
Instead, assert check we need a similar bailout logic[2] before looping back to
first partial plan. Attached patch does the same, I've tested with
parallel_leader_participation = off setting as suggested by Andres, make check
looks good except there is some obvious regression diff.
1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com
Regards,
Amul
On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Robert Haas <rhaas(at)postgresql(dot)org> writes:
>>> Support Parallel Append plan nodes.
>
>> Buildfarm member sidewinder doesn't like this:
>
> It transpires that my machine prairiedog also shows the failure.
> I instrumented nodeAppend.c (as per attached patch) to see what
> was going on, and I get this trace on prairiedog:
>
> 2017-12-05 22:50:30.228 EST [28812] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.228 EST [28812] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.241 EST [28813] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.241 EST [28813] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.252 EST [28814] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.252 EST [28814] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.407 EST [28815] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.407 EST [28815] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.474 EST [28817] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.474 EST [28817] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520)
> 2017-12-05 22:50:38.211 EST [28158] LOG: background worker "parallel worker" (PID 28818) was terminated by signal 6: Abort trap
> 2017-12-05 22:50:38.211 EST [28158] DETAIL: Failed process was running: select round(avg(aa)), sum(aa) from a_star a3;
>
> That makes it pretty clear what's going wrong, but why don't we see it
> elsewhere?
>
> On my development machine, the same patch yields
>
> 2017-12-05 22:42:24.187 EST [843] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.187 EST [843] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [844] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [844] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [845] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [845] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.197 EST [846] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.197 EST [846] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [847] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [847] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [848] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [848] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.214 EST [849] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [849] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [852] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [852] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [853] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [853] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3;
>
> I conclude that the reason we don't see the crash except on the slowest
> buildfarm critters is that on most machines the leader process manages
> to finish off all the subplans before any of the workers get as far as
> doing something. This doesn't give me a warm fuzzy feeling about how
> much testing this patch has gotten.
>
> regards, tom lane
>
Attachment | Content-Type | Size |
---|---|---|
Parallel-Append-crash-fix.patch | application/octet-stream | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-12-06 08:03:19 | Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple |
Previous Message | Andres Freund | 2017-12-06 04:17:15 | Re: pgsql: Support Parallel Append plan nodes. |
From | Date | Subject | |
---|---|---|---|
Next Message | sanyam jain | 2017-12-06 06:36:36 | Error bundling script file with extension when event trigger commands are used |
Previous Message | Amit Kapila | 2017-12-06 06:07:18 | Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com |