Re: Explain buffers wrong counter with parallel plans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Explain buffers wrong counter with parallel plans
Date: 2018-07-07 04:03:41
Message-ID: CAA4eK1JW0JVVznGOYcjnii35wyz3dw_sv1H1ANp8hR7D85ZwyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 7, 2018 at 7:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sat, Jul 7, 2018 at 12:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Jul 6, 2018 at 9:44 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> I have tried this idea, but it doesn't completely solve the problem.
>>> The problem is that nodes below LIMIT won't get a chance to accumulate
>>> the stats as they won't be able to call InstrStopNode.
>>
>> I'm not sure I understand. Why not? I see that we'd need to insert
>> an extra call to InstrStopNode() if we were stopping the node while it
>> was running, because then InstrStartNode() would have already been
>> done, but the corresponding call to InstrStopNode() would not have
>> been done. But I'm not sure how that would happen in this case. Can
>> you explain further?
>>
>
> Okay, let me try. The code flow is that for each node we will call
> InstrStartNode()->ExecProcNodeReal()->InstrStopNode(). Now let's say
> we have to execute a plan Limit->Gather-> Parallel SeqScan. In this,
> first for Limit node, we will call InstrStartNode() and
> ExecProcNodeReal() and then for Gather we will call InstrStartNode(),
> ExecProcNodeReal() and InstrStopNode(). Now, Limit node decides that
> it needs to shutdown all the nodes (ExecShutdownNode) and after that
> it will call InstrStopNode() for Limit node. So, in this flow after
> shutting down nodes, we never get chance for Gather node to use stats
> collected during ExecShutdownNode.
>

I went ahead and tried the solution which I had mentioned yesterday,
that is to allow ExecShutdownNode to count stats. Apart from fixing
this problem, it will also fix the problem with Gather Merge as
reported by Adrien [1], because now Gather Merge will also get a
chance to count stats after shutting down workers.

Note that, I have changed the location of InstrStartParallelQuery in
ParallelQueryMain so that the buffer usage stats are accumulated only
for the plan execution which is what we do for instrumentation
information as well. If we don't do that, it will count some
additional stats for ExecutorStart which won't match with what we have
in Instrumentation structure of each node.

[1] - https://www.postgresql.org/message-id/01952aab-33ca-36cd-e74b-ce32f3eefc84%40anayrat.info

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_gather_stats_v1.patch application/octet-stream 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-07-07 05:08:03 Re: EXPLAIN of Parallel Append
Previous Message David G. Johnston 2018-07-07 03:36:34 Re: Transition relations: correlating OLD TABLE and NEW TABLE