Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-05-10 21:30:58
Message-ID: CA+TgmoY25MrSA=N7uLyq0eVSL5M=OUzj6tZe0y76Mx0XDmq0pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 7, 2015 at 3:23 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > I observed one issue while working on this review comment. When we
>> > try to destroy the parallel setup via ExecEndNode (as due to Limit
>> > Node, it could not destroy after consuming all tuples), it waits for
>> > parallel
>> > workers to finish (WaitForParallelWorkersToFinish()) and parallel
>> > workers
>> > are waiting for master backend to signal them as their queue is full.
>> > I think in such a case master backend needs to inform workers either
>> > when
>> > the scan is discontinued due to limit node or while waiting for parallel
>> > workers to finish.
>>
>> Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()?
>> That's supposed to unstick the workers; any impending or future writes
>> will just return SHM_MQ_DETACHED without waiting.
>
> Okay, that can work if we call it in ExecEndNode() before
> WaitForParallelWorkersToFinish(), however what if we want to do something
> like TupleQueueFunnelShutdown() when Limit node decides to stop processing
> the outer node. We can traverse the whole plan tree and find the nodes
> where
> parallel workers needs to be stopped, but I don't think thats good way to
> handle
> it. If we don't want to stop workers from processing until
> ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till
> that time and it won't be easy to get instrumentation/buffer usage
> information
> from workers (workers fill such information for master backend after
> execution
> is complete) as that is done before ExecutorEnd(). For Explain Analyze ..,
> we
> can ensure that workers are stopped before fetching that information from
> Funnel node, but the same is not easy for buffer usage stats required by
> plugins as that operates at ExecutorRun() and ExecutorFinish() level where
> we don't have direct access to node level information. You can refer
> pgss_ExecutorEnd() where it completes the storage of stats information
> before calling ExecutorEnd(). Offhand, I could not think of a good way to
> do this, but one crude way could be introduce a new API
> (ParallelExecutorEnd())
> for such plugins which needs to be called before completing the stats
> accumulation.
> This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow
> accumulation of stats (InstrStartNode()/InstrStopNode())

OK, so if I understand you here, the problem is what to do about an
"orphaned" worker. The Limit node just stops fetching from the lower
nodes, and those nodes don't get any clue that this has happened, so
their workers just sit there until the end of the query. Of course,
that happens already, but it doesn't usually hurt very much, because
the Limit node usually appears at or near the top of the plan.

It could matter, though. Suppose the Limit is for a subquery that has
a Sort somewhere (not immediately) beneath it. My guess is the Sort's
tuplestore will stick around until after the subquery finishes
executing for as long as the top-level query is executing, which in
theory could be a huge waste of resources. In practice, I guess
people don't really write queries that way. If they did, I think we'd
have already developed some general method for fixing this sort of
problem.

I think it might be better to try to solve this problem in a more
localized way. Can we arrange for planstate->instrumentation to point
directory into the DSM, instead of copying the data over later? That
seems like it might help, or perhaps there's another approach.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-05-10 21:33:52 Re: default value dosen't get applyed in this situation
Previous Message Aliouii Ali 2015-05-10 21:10:05 default value dosen't get applyed in this situation