Re: A reloption for partitioned tables - parallel_workers

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Seamus Abshere <seamus(at)abshere(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A reloption for partitioned tables - parallel_workers
Date: 2021-04-02 14:36:27
Message-ID: 05f271ec2076d651e98f95fd9fc63784e4f34a57.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > Attached a new version rebased over c8f78b616, with the grouping
> > relation partitioning enhancements as a separate patch 0001. Sorry
> > about the delay.
>
> I had a quick look at this and wondered if the partitioned table's
> parallel workers shouldn't be limited to the sum of the parallel
> workers of the Append's subpaths?
>
> It seems a bit weird to me that the following case requests 4 workers:
>
> # create table lp (a int) partition by list(a);
> # create table lp1 partition of lp for values in(1);
> # insert into lp select 1 from generate_series(1,10000000) x;
> # alter table lp1 set (parallel_workers = 2);
> # alter table lp set (parallel_workers = 4);
> # set max_parallel_workers_per_Gather = 8;
> # explain select count(*) from lp;
> QUERY PLAN
> -------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=97331.63..97331.64 rows=1 width=8)
> -> Gather (cost=97331.21..97331.62 rows=4 width=8)
> Workers Planned: 4
> -> Partial Aggregate (cost=96331.21..96331.22 rows=1 width=8)
> -> Parallel Seq Scan on lp1 lp (cost=0.00..85914.57
> rows=4166657 width=0)
> (5 rows)
>
> I can see a good argument that there should only be 2 workers here.

Good point, I agree.

> If someone sets the partitioned table's parallel_workers high so that
> they get a large number of workers when no partitions are pruned
> during planning, do they really want the same number of workers in
> queries where a large number of partitions are pruned?
>
> This problem gets a bit more complex in generic plans where the
> planner can't prune anything but run-time pruning prunes many
> partitions. I'm not so sure what to do about that, but the problem
> does exist today to a lesser extent with the current method of
> determining the append parallel workers.

Also a good point. That would require changing the actual number of
parallel workers at execution time, but that is tricky.
If we go with your suggestion above, we'd have to disambiguate if
the number of workers is set because a partition is large enough
to warrant a parallel scan (then it shouldn't be reduced if the executor
prunes partitions) or if it is because of the number of partitions
(then it should be reduced).

Currently, we don't reduce parallelism if the executor prunes
partitions, so this could be seen as an independent problem.

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

Perhaps Amit's patch 0001 should go in independently.

I'll mark the patch as "waiting for author".

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-02 14:45:18 Re: libpq debug log
Previous Message Stephen Frost 2021-04-02 14:23:54 Re: policies with security definer option for allowing inline optimization