Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Date: 2016-07-01 13:52:56
Message-ID: CA+TgmobRf_GrdYoq-2R=e6RjBOfODuP0pVGaTL5tS7pyvMa-xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 20, 2016 at 3:06 AM, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
> IMHO, "actual rows" of "Parallel Seq Scan" should not be divided by the loops,
> because the total rows processed here is 10000000, not 3333333 * 3.
> I think the actual row value shown here "3333333 " is a bit confusing and tricky
> for users.

I don't think the total number of rows for a node should EVER be
divided by the number of loops. We've had that behavior for a long
time and I find it to be mind-numbingly stupid. Whenever I'm reading
an EXPLAIN plan, I end up trying to figure out what's really going on
by multiplying the number of rows shown by the loop count, but often
the row count is very small, like 0 or 1 or 2, so the round-off error
is large and you can't really tell what's happening. Nobody wants to
know the average number of rows returned per node execution: they
want, as you want here, the total number of rows that node ever
processed. I doubt we can convince Tom Lane to let us change it, but
feel free to post a patch.

One thing I don't think we can do here is have some weird exception
where parallel query works differently from everything else. "loops"
just counts the number of times that the node was executed. It most
often ends up >1 when the plan node is on the inner side of a nested
loop, but parallel query ends up creating that scenario also. There's
no real way to separate those things out, though. If a node executes
3 times in one worker, 4 times in another, and once in the leader,
what value are you going to display for loops other than 8? And if
you accept that's the right answer in that case, then you pretty much
need the answer when it executes once in one worker, once in another
worker, and once in the leader to be 3. I agree that this is very
confusing - and you're not the first person to complain about it - but
I think that parallel query is merely throwing light on the fact that
the pre-existing behavior of EXPLAIN is poorly chosen, not creating
any fundamentally new issue.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-07-01 13:56:15 Re: Comment and function argument names are mismatched in bugmgr.c.
Previous Message Tom Lane 2016-07-01 13:52:53 Re: EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions