From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SegFault on 9.6.14 |
Date: | 2019-08-12 01:04:38 |
Message-ID: | CA+hUKGK=dChcQtgf1-r73XEqDhW1VD32Tf9qDi39aYbjuE3mGA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Aug 10, 2019 at 12:59 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But beyond that, the issue here is that the Limit node is shutting
> down the Gather node too early, and the right fix must be to stop
> doing that, not to change the definition of what it means to shut down
> a node, as this patch does. So maybe a possible approach here - which
> I think is more or less what Tom is proposing - is:
>
> 1. Remove the code from ExecLimit() that calls ExecShutdownNode().
> 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
> gets called at the very end of execution, at least when execute_once
> is set, before exiting parallel mode.
> 3. Figure out, possibly at a later time or only in HEAD, how to make
> the early call to ExecLimit() in ExecShutdownNode(), and then put it
> back. I think we could do this by passing down some information
> indicating which nodes are potentially rescanned by other nodes higher
> up in the tree; there's the separate question of whether rescans can
> happen due to cursor operations, but the execute_once stuff can handle
> that aspect of it, I think.
>
> I'm not quite sure that approach is altogether correct so I'd
> appreciate some analysis on that point.
I'm not sure exactly what we should do yet, but one thought I wanted
to resurrect from older discussions is that we now think it was a
mistake to give every Gather node its own DSM segment, having seen
queries in the wild where that decision interacted badly with large
number of partitions. In 13 we should try to figure out how to have a
single DSM segment allocated for all Gather[Merge] nodes in the tree
(and remove the embarrassing band-aid hack in commit fd7c0fa7).
That's possibly relevant because it means we'd have a ParallelContext
or some new overarching object that has a lifetime that is longer than
the individual Gather nodes' processes and instrumentation data. I'm
not saying we need to discuss any details of this other concern now,
I'm just wondering out loud if the whole problem in this thread goes
away automatically when we fix it.
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Kohei KaiGai | 2019-08-12 04:28:03 | Re: How to retain lesser paths at add_path()? |
Previous Message | Tom Lane | 2019-08-11 22:41:36 | Re: Parallel Append subplan order instability on aye-aye |