From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel joins, and better parallel explain |
Date: | 2016-01-04 09:50:03 |
Message-ID: | CAFiTN-uAkSaSKLZrzk8_VfQ64+9dX8i-w4SSkQGHMKxtQYzTbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 24, 2015 at 4:45 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 23, 2015 at 2:34 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> > Yeah right, After applying all three patches this problem is fixed, now
> > parallel hash join is faster than normal hash join.
> >
> > I have tested one more case which Amit mentioned, I can see in that case
> > parallel plan (parallel degree>= 3) is still slow, In Normal case it
> selects
> > "Hash Join" but in case of parallel worker > 3 it selects Parallel "Nest
> > Loop Join" which is making it costlier.
>
> While investigating this problem, I discovered that I can produce a
> regression even on unpatched master:
>
> yeah, right..
> But this is not entirely the fault of the parallel query code. If you
> force a seqscan-over-seqscan plan in the non-parallel cast, it
> estimates the join cost as 287772.00, only slightly more than the
> 261522.02 cost units it thinks a non-parallel hash join will cost. In
> fact, however, the non-parallel hash join runs in 1.2 seconds and the
> non-parallel nested loop takes 46 seconds.
>
right..
>
> Updated patch attached.
>
>
Thanks for the updated patch...
I have found regression in one more scenario, in hash Join..
Scenario:
--------------
I tried to create a inner table such that, inner table data don't fit in
RAM (I created VM with 1GB Ram).
Purpose of this is to make Disk scan dominant,
and since parallel join is repeating the Disk Scan and hash table building
of inner table, so there will be lot of Parallel I/O and it has to pay
penalty.
I think even though, inner table scanning and hash table building is
parallel, but there will be lot of parallel I/O which will become
bottleneck.
Do we need to consider the cost for parallel i/o also, i am not sure can we
really do that... ?
Note: I have tested with 1GB RAM machine and 8GB RAM machine.
Regression in 8GB RAM machine is less compared to 1GB RAM..
create table t1 (c1 int, c2 int, c3 text);
create table t2 (c1 int, c2 int, c3 text);
insert into t1 values(generate_series(1,100000000),
generate_series(1,100000000), repeat('x', 100));
insert into t2 values(generate_series(1,48000000),
generate_series(1,48000000), repeat('x', 5));
analyze t1;
analyze t2;
Test with: 1GB RAM
-----------------------------
postgres=# set max_parallel_degree=0;
SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12248485.55..12248485.56 rows=1 width=0) (actual
time=147490.455..147490.455 rows=1 loops=1)
-> Hash Join (cost=1526963.25..12208485.47 rows=16000033 width=0)
(actual time=26652.871..143368.989 rows=47999950 loops=1)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 50
-> Seq Scan on t1 (cost=0.00..2742412.72 rows=100005072 width=4)
(actual time=130.580..40127.004 rows=100000000 loops=1)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=26500.439..26500.439 rows=48000000 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00 rows=48000100
width=8) (actual time=0.039..11402.343 rows=48000000 loops=1)
Planning time: 0.410 ms
Execution time: 147490.553 ms
(11 rows)
postgres=# set max_parallel_degree=6;
SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4969933.98..4969933.99 rows=1 width=0) (actual
time=386024.487..386024.488 rows=1 loops=1)
-> Gather (cost=1527963.25..4929933.89 rows=16000033 width=0) (actual
time=199190.138..379487.861 rows=47999950 loops=1)
Number of Workers: 6
-> Hash Join (cost=1526963.25..3328930.59 rows=16000033 width=0)
(actual time=178885.161..320724.381 rows=6857136 loops=7)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 7
-> Parallel Seq Scan on t1 (cost=0.00..421909.65
rows=15385396 width=4) (actual time=106.403..11735.643 rows=14285714
loops=7)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=177959.433..177959.433 rows=48000000 loops=7)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00
rows=48000100 width=8) (actual time=0.022..20778.693 rows=48000000 loops=7)
Planning time: 0.372 ms
Execution time: 386025.056 ms
Test with 8GB RAM:
---------------------------
postgres=# set max_parallel_degree=0;
SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12229853.83..12229853.84 rows=1 width=0) (actual
time=111113.286..111113.286 rows=1 loops=1)
-> Hash Join (cost=1526963.25..12189853.75 rows=16000033 width=0)
(actual time=15830.319..108557.658 rows=47999950 loops=1)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 50
-> Seq Scan on t1 (cost=0.00..2724138.00 rows=100000000 width=4)
(actual time=3.515..43207.798 rows=100000000 loops=1)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=15436.088..15436.088 rows=48000000 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00 rows=48000100
width=8) (actual time=0.677..6290.310 rows=48000000 loops=1)
Planning time: 0.287 ms
Execution time: 111113.358 ms
(11 rows)
postgres=# set max_parallel_degree=6;
SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6538149.22..6538149.23 rows=1 width=0) (actual
time=172636.184..172636.184 rows=1 loops=1)
-> Gather (cost=1527963.25..6498149.14 rows=16000033 width=0) (actual
time=40952.576..168973.552 rows=47999950 loops=1)
Number of Workers: 6
-> Hash Join (cost=1526963.25..4897145.84 rows=16000033 width=0)
(actual time=41109.818..151129.893 rows=6857136 loops=7)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 7
-> Parallel Seq Scan on t1 (cost=0.00..1890804.67
rows=16666667 width=4) (actual time=0.492..86241.998 rows=14285714 loops=7)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=40936.920..40936.920 rows=48000000 loops=7)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00
rows=48000100 width=8) (actual time=0.024..22644.484 rows=48000000 loops=7)
Planning time: 2.668 ms
Execution time: 172636.647 ms
(13 rows)
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Shulgin, Oleksandr | 2016-01-04 10:38:01 | Re: \x auto and EXPLAIN |
Previous Message | Amit Langote | 2016-01-04 09:34:01 | Re: Multi-tenancy with RLS |