Re: SegFault on 9.6.14

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: SegFault on 9.6.14
Date: 2019-07-23 03:40:46
Message-ID: CA+hUKGLh1XKbqrq22UxEn3iJLoqV261OTKy9uqA8PYuxT6hHPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Jul 18, 2019 at 7:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > > Hmm, so something like a new argument "bool final" added to the
> > > ExecXXXShutdown() functions, which receives false in this case to tell
> > > it that there could be a rescan so keep the parallel context around.
> >
> > I think this is going in the wrong direction. Nodes should *always*
> > assume that a rescan is possible until ExecEndNode is called.
>
> I am thinking that why not we remove the part of destroying the
> parallel context (and shared memory) from ExecShutdownGather (and
> ExecShutdownGatherMerge) and then do it at the time of ExecEndGather
> (and ExecEndGatherMerge)? This should fix the bug in hand and seems
> to be more consistent with our overall design principles. I have not
> tried to code it to see if there are any other repercussions of the
> same but seems worth investigating. What do you think?

I tried moving ExecParallelCleanup() into ExecEndGather(). The first
problem is that ExecutePlan() wraps execution in
EnterParallelMode()/ExitParallelMode(), but ExitParallelMode() fails
an assertion that no parallel context is active because
ExecEndGather() hasn't run yet. The enclosing
ExecutorStart()/ExecutorEnd() calls are further down the call stack,
in ProcessQuery(). So some more restructuring might be needed to exit
parallel mode later, but then I feel like you might be getting way out
of back-patchable territory, especially if it involves moving code to
the other side of the executor hook boundary. Is there an easier way?

Another idea from the band-aid-solutions-that-are-easy-to-back-patch
department: in ExecutePlan() where we call ExecShutdownNode(), we
could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have
ExecGatherShutdown() only run ExecParallelCleanup() if it sees that
flag. That's not beautiful, but it's less churn that the 'bool final'
argument we discussed before, and could be removed in master when we
have a better way.

Stepping back a bit, it seems like we need two separate tree-walking
calls: one to free resources not needed anymore by the current rescan
(workers), and another to free resources not needed ever again
(parallel context). That could be spelled ExecShutdownNode(false) and
ExecShutdownNode(true), or controlled with the EXEC_FLAG_DONE kluge,
or a new additional ExecSomethingSomethingNode() function, or as you
say, perhaps the second thing could be incorporated into
ExecEndNode(). I suspect that the Shutdown callbacks for Hash, Hash
Join, Custom Scan and Foreign Scan might not be needed anymore if we
could keep the parallel context around until after the run
ExecEndNode().

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-07-23 03:46:59 RE: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message Kyotaro Horiguchi 2019-07-23 03:26:28 "Attachment not found" by CF-app