Re: parallel joins, and better parallel explain

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel joins, and better parallel explain
Date: 2015-12-18 15:17:33
Message-ID: CA+TgmoZc00iCiZLnORB69T7hmEZ5s3vbA0pc_7xFZUHLV6y6vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 18, 2015 at 3:54 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Fri, Dec 18, 2015 at 7.59 AM Robert Haas <robertmhaas(at)gmail(dot)com> Wrote,
>> Uh oh. That's not supposed to happen. A GatherPath is supposed to
>> have parallel_safe = false, which should prevent the planner from
>> using it to form new partial paths. Is this with the latest version
>> of the patch? The plan output suggests that we're somehow reaching
>> try_partial_hashjoin_path() with inner_path being a GatherPath, but I
>> don't immediately see how that's possible, because
>> create_gather_path() sets parallel_safe to false unconditionally, and
>> hash_inner_and_outer() never sets cheapest_safe_inner to a path unless
>> that path is parallel_safe.
>
> Yes, you are right, that create_gather_path() sets parallel_safe to false
> unconditionally but whenever we are building a non partial path, that time
> we should carry forward the parallel_safe state to its parent, and it seems
> like that part is missing here..

Ah, right. Woops. I can't exactly replicate your results, but I've
attempted to fix this in a systematic way in the new version attached
here (parallel-join-v3.patch).

>> Do you have a self-contained test case that reproduces this, or any
>> insight as to how it's happening here?
>
> This is TPC-H benchmark case:
> we can setup like this..
> 1. git clone https://tkejser(at)bitbucket(dot)org/tkejser/tpch-dbgen.git
> 2. complie using make
> 3. ./dbgen –v –s 5
> 4. ./qgen

Thanks. After a bit of fiddling I was able to get this to work. I'm
attaching two other patches that seem to help this case quite
considerably. The first (parallel-reader-order-v1) cause Gather to
read from the same worker repeatedly until it can't get another tuple
from that worker without blocking, and only then move on to the next
worker. With 4 workers, this seems to be drastically more efficient
than what's currently in master - I saw the time for Q5 drop from over
17 seconds to about 6 (this was an assert-enabled build running with
EXPLAIN ANALYZE, though, so take those numbers with a grain of salt).
The second (gather-disuse-physical-tlist.patch) causes Gather to force
underlying scan nodes to project, which is a good idea here for
reasons very similar to why it's a good idea for the existing node
types that use disuse_physical_tlist: forcing extra data through the
Gather node is bad. That shaved another half second off this query.

The exact query I was using for testing was:

explain (analyze, verbose) select n_name, sum(l_extendedprice * (1 -
l_discount)) as revenue from customer, orders, lineitem, supplier,
nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey
and l_suppkey = s_suppkey and c_nationkey = s_nationkey and
s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name =
'EUROPE' and o_orderdate >= date '1995-01-01' and o_orderdate < date
'1995-01-01' + interval '1' year group by n_name order by revenue
desc;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
parallel-join-v3.patch text/x-patch 61.3 KB
parallel-reader-order-v1.patch text/x-patch 1.4 KB
gather-disuse-physical-tlist.patch text/x-patch 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-18 15:36:53 Re: Remove array_nulls?
Previous Message Robert Treat 2015-12-18 14:52:08 Re: Remove array_nulls?