Re: Parallel seq. plan is not coming against inheritance or partition table

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: Parallel seq. plan is not coming against inheritance or partition table
Date: 2017-03-06 02:41:50
Message-ID: CAA4eK1J6Ar_fFRukpRr1AU2uviRED-JGXgqQ48id1AW3jhnXgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 4, 2017 at 9:52 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> Hi All,
>
> From following git commit onwards, parallel seq scan is never getting
> selected for inheritance or partitioned tables.
>
> <git-commit>
> commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: Wed Feb 15 13:37:24 2017 -0500
>
> Replace min_parallel_relation_size with two new GUCs.
> </git-commit>
>
>
> Steps to reproduce:
> ==============
> create table t1 (a integer);
>
> create table t1_1 (check (a >=1 and a < 1000000)) inherits (t1);
> create table t1_2 (check (a >= 1000000 and a < 2000000)) inherits (t1);
>
> insert into t1_1 select generate_series(1, 900000);
> insert into t1_2 select generate_series(1000000, 1900000);
>
> analyze t1;
> analyze t1_1;
> analyze t1_2;
>
> explain analyze select * from t1 where a < 50000 OR a > 1950000;
>
> EXPLAIN ANALYZE output:
> ====================
> 1) Prior to "Replace min_parallel_relation_size with two new GUCs" commit,
>
> postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Gather (cost=1000.00..25094.71 rows=48787 width=4) (actual
> time=0.431..184.264 rows=49999 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Append (cost=0.00..19216.01 rows=20328 width=4) (actual
> time=0.025..167.465 rows=16666 loops=3)
> -> Parallel Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4)
> (actual time=0.001..0.001 rows=0 loops=3)
> Filter: ((a < 50000) OR (a > 1950000))
> -> Parallel Seq Scan on t1_1 (cost=0.00..9608.00 rows=20252
> width=4) (actual time=0.023..76.644 rows=16666 loops=3)
> Filter: ((a < 50000) OR (a > 1950000))
> Rows Removed by Filter: 283334
> -> Parallel Seq Scan on t1_2 (cost=0.00..9608.01 rows=75
> width=4) (actual time=89.505..89.505 rows=0 loops=3)
> Filter: ((a < 50000) OR (a > 1950000))
> Rows Removed by Filter: 300000
> Planning time: 0.343 ms
> Execution time: 188.624 ms
> (14 rows)
>
> 2) From "Replace min_parallel_relation_size with two new GUCs" commit onwards,
>
> postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
> Append (cost=0.00..34966.01 rows=50546 width=4) (actual
> time=0.021..375.747 rows=49999 loops=1)
> -> Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4) (actual
> time=0.004..0.004 rows=0 loops=1)
> Filter: ((a < 50000) OR (a > 1950000))
> -> Seq Scan on t1_1 (cost=0.00..17483.00 rows=50365 width=4)
> (actual time=0.016..198.393 rows=49999 loops=1)
> Filter: ((a < 50000) OR (a > 1950000))
> Rows Removed by Filter: 850001
> -> Seq Scan on t1_2 (cost=0.00..17483.01 rows=180 width=4)
> (actual time=173.310..173.310 rows=0 loops=1)
> Filter: ((a < 50000) OR (a > 1950000))
> Rows Removed by Filter: 900001
> Planning time: 0.812 ms
> Execution time: 377.831 ms
> (11 rows)
>
> RCA:
> ====
> From "Replace min_parallel_relation_size with two new GUCs" commit
> onwards, we are not assigning parallel workers for the child rel with
> zero heap pages. This means we won't be able to create a partial
> append path as this requires all the child rels within an Append Node
> to have a partial path.
>

Right, but OTOH, if we assign parallel workers by default, then it is
quite possible that it would result in much worse plans. Consider a
case where partition hierarchy has 1000 partitions and only one of
them is big enough to allow parallel workers. Now in this case, with
your proposed fix it will try to scan all the partitions in parallel
workers which I think can easily result in bad performance. I think
the right way to make such plans parallel is by using Parallel Append
node (https://commitfest.postgresql.org/13/987/) Alternatively, if
you want to force parallelism in cases like the one you have shown in
example, you can use Alter Table .. Set (parallel_workers = 1).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-06 02:50:42 Re: SCRAM authentication, take three
Previous Message Tsunakawa, Takayuki 2017-03-06 02:27:06 Re: Statement-level rollback