From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SegFault on 9.6.14 |
Date: | 2019-08-09 12:59:26 |
Message-ID: | CA+TgmoZOK7XJL_b9mcEB7C2WzxfLi8j7M4APK=MOhe+dAivENA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 7, 2019 at 5:45 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> I have made a patch based on the above lines.
> I have tested the scenarios which Thomas had shared in the earlier
> mail and few more tests based on Thomas's tests.
> I'm not sure if we will be going ahead with this solution or not.
> Let me know your opinion on the same.
> If you feel this approach is ok, we can add few of this tests into pg tests.
I think this patch is bizarre:
- It introduces a new function called getParallelModeLevel(), which is
randomly capitalized different from the other functions that do
similar things, and then uses it to do the same thing that could have
been done with the existing function IsInParallelMode().
- It contains an "if" statement whose only content is an Assert().
Don't write if (a) Assert(b); write Assert(!a || b).
- It contains zero lines of comment changes, which is obviously not
enough for a patch that proposes to fix a very thorny issue. This
failure has two parts. First, it adds no new comments to explain the
bug being fixed or the theory of operation of the new code. Second, it
does not even bother updating existing comments that are falsified by
the patch, such as the function header comments for
ExecParallelCleanup and ExecShutdownGather.
- It changes what ExecParallelCleanup does while adjusting only one of
the two callers to match the behavior change. nodeGatherMerge.c
manages to be completed untouched by this patch. If you change what a
function does, you really need to grep for all the calls to that
function and adjust all callers to match the new set of expectations.
It's a little hard to get past all of those issues and look at what
the patch actually does, but I'm going to try: the theory of operation
of the patch seems to be that we can skip destroying the parallel
context when performing ExecParallelCleanup and in fact when exiting
parallel mode, and then when we get to executor end time the context
will still be there and we can fish the instrumentation out of it. But
this seems problematic for several reasons. For one thing, as Amit
already observed, the code currently contains an assertion which
ensure that a ParallelContext can't outlive the time spent in parallel
mode, and it doesn't seem desirable to relax that assertion (this
patch removes it).
But beyond that, the issue here is that the Limit node is shutting
down the Gather node too early, and the right fix must be to stop
doing that, not to change the definition of what it means to shut down
a node, as this patch does. So maybe a possible approach here - which
I think is more or less what Tom is proposing - is:
1. Remove the code from ExecLimit() that calls ExecShutdownNode().
2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
gets called at the very end of execution, at least when execute_once
is set, before exiting parallel mode.
3. Figure out, possibly at a later time or only in HEAD, how to make
the early call to ExecLimit() in ExecShutdownNode(), and then put it
back. I think we could do this by passing down some information
indicating which nodes are potentially rescanned by other nodes higher
up in the tree; there's the separate question of whether rescans can
happen due to cursor operations, but the execute_once stuff can handle
that aspect of it, I think.
I'm not quite sure that approach is altogether correct so I'd
appreciate some analysis on that point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2019-08-09 13:06:26 | Re: block-level incremental backup |
Previous Message | Michael Paquier | 2019-08-09 10:09:22 | Re: Add "password_protocol" connection parameter to libpq |