Re: increased max_parallel_workers_per_gather results in fewer workers?

From: Philip Semanchuk <philip(at)americanefficient(dot)com>
To: Sebastian Dressler <sebastian(at)swarm64(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: increased max_parallel_workers_per_gather results in fewer workers?
Date: 2020-06-04 18:37:40
Message-ID: 63B14096-D061-4051-9506-04DCEABDDD15@americanefficient.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> On Jun 4, 2020, at 1:45 PM, Sebastian Dressler <sebastian(at)swarm64(dot)com> wrote:
>
> Hi Philip,
>
>> On 4. Jun 2020, at 18:41, Philip Semanchuk <philip(at)americanefficient(dot)com> wrote:
>> [...]
>>
>>> Also, there are more configuration settings related to parallel queries you might want to look into. Most notably:
>>>
>>> parallel_setup_cost
>>> parallel_tuple_cost
>>> min_parallel_table_scan_size
>>>
>>> Especially the last one is a typical dealbreaker, you can try to set it to 0 for the beginning. Good starters for the others are 500 and 0.1 respectively.
>>
>> Aha! By setting min_parallel_table_scan_size=0, Postgres uses the 6 workers I expect, and the execution time decreases nicely.
>>
>> I posted a clumsily-anonymized plan for the “bad” scenario here --
>> https://gist.github.com/osvenskan/ea00aa71abaa9697ade0ab7c1f3b705b
>>
>> There are 3 sort nodes in the plan. When I get the “bad” behavior, the sorts have one worker, when I get the good behavior, they have multiple workers (e.g. 6).
>
> I also think, what Luis pointed out earlier might be a good option for you, i.e. setting
>
> parallel_leader_participation = off;
>
> And by the way, this 1 worker turns actually into 2 workers in total with leader participation enabled.

I’ll try that out, thanks.

>
>> This brings up a couple of questions —
>> 1) I’ve read that this is Postgres’ formula for the max # of workers it will consider for a table —
>>
>> max_workers = log3(table size / min_parallel_table_scan_size)
>>
>> Does that use the raw table size, or does the planner use statistics to estimate the size of the subset of the table that will be read before allocating workers?
>
> "table size" is the number of PSQL pages, i.e. relation-size / 8 kB. This comes from statistics.

OK, so it sounds like the planner does *not* use the values in pg_stats when planning workers, true?

I’m still trying to understand one thing I’ve observed. I can run the query that produced the plan in the gist I linked to above with max_parallel_workers_per_gather=6 and the year param = 2018, and I get 6 workers. When I set the year param=2022 I get only one worker. Same tables, same query, different parameter. That suggests to me that the planner is using pg_stats when allocating workers, but I can imagine there might be other things going on that I don’t understand. (I haven’t ruled out that this might be an AWS-specific quirk, either.)

Cheers
Philip

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sebastian Dressler 2020-06-04 19:03:26 Re: increased max_parallel_workers_per_gather results in fewer workers?
Previous Message Sebastian Dressler 2020-06-04 17:45:53 Re: increased max_parallel_workers_per_gather results in fewer workers?