Re: increased max_parallel_workers_per_gather results in fewer workers?

From: Sebastian Dressler <sebastian(at)swarm64(dot)com>
To: Philip Semanchuk <philip(at)americanefficient(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: increased max_parallel_workers_per_gather results in fewer workers?
Date: 2020-06-04 19:03:26
Message-ID: A6D2F42F-2E2C-4EA9-91AD-AA4D6B1F47A9@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Philip,

On 4. Jun 2020, at 20:37, Philip Semanchuk <philip(at)americanefficient(dot)com<mailto:philip(at)americanefficient(dot)com>> wrote:

[...]

This brings up a couple of questions —
1) I’ve read that this is Postgres’ formula for the max # of workers it will consider for a table —

max_workers = log3(table size / min_parallel_table_scan_size)

Does that use the raw table size, or does the planner use statistics to estimate the size of the subset of the table that will be read before allocating workers?

"table size" is the number of PSQL pages, i.e. relation-size / 8 kB. This comes from statistics.

OK, so it sounds like the planner does *not* use the values in pg_stats when planning workers, true?

Full disclosure: I am not too deep into these internals, likely others on the list know much more about it. But with respect to the relation size, I think this is tracked elsewhere, it might be affected by other parameters though like vacuuming and probably, the estimated amount of how much of the table is scanned also plays a role.

I’m still trying to understand one thing I’ve observed. I can run the query that produced the plan in the gist I linked to above with max_parallel_workers_per_gather=6 and the year param = 2018, and I get 6 workers. When I set the year param=2022 I get only one worker. Same tables, same query, different parameter. That suggests to me that the planner is using pg_stats when allocating workers, but I can imagine there might be other things going on that I don’t understand. (I haven’t ruled out that this might be an AWS-specific quirk, either.)

I think it would be helpful, if you could post again both plans. The ideal would be to use https://explain.dalibo.com/ and share the links. You will have to generate them with JSON format, but still can anonymize them.

Obviously, the plan changes when changing these two parameters, comparing both plans very likely unveils why that is the case. My guess would be, that something in the estimated amount of rows changes causing PG to prefer a different plan with lower cost.

Also, maybe on that occasion, check the default_statistics_target parameter which is default wise at 100, but for analytical case like - I assume - yours higher values tend to improve the planning. You can try with for instance 1000 or 2500. In contrast to changing this parameter globally, you can also adjust it per table (ALTER TABLE SET STATISTICS).

Cheers,
Sebastian

--

Sebastian Dressler, Solution Architect
+49 30 994 0496 72 | sebastian(at)swarm64(dot)com<mailto:sebastian(at)swarm64(dot)com>

Swarm64 AS
Parkveien 41 B | 0258 Oslo | Norway
Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender (Styrets Leder): Dr. Sverre Munck

Swarm64 AS Zweigstelle Hive
Ullsteinstr. 120 | 12109 Berlin | Germany
Registered at Amtsgericht Charlottenburg - HRB 154382 B

[cid:image001(dot)jpg(at)01D5413F(dot)54ADD080]

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Philip Semanchuk 2020-06-04 21:29:57 Re: increased max_parallel_workers_per_gather results in fewer workers?
Previous Message Philip Semanchuk 2020-06-04 18:37:40 Re: increased max_parallel_workers_per_gather results in fewer workers?