From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Append implementation |
Date: | 2017-04-06 02:03:23 |
Message-ID: | 20170406020323.ef6tyffrg6lzdpvw@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
> This is what the earlier versions of my patch had done : just add up
> per-subplan parallel_workers (1 for non-partial subplan and
> subpath->parallel_workers for partial subplans) and set this total as
> the Append parallel_workers.
I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query. What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:
best_nonpartial_cost = -1
best_nonpartial_nworkers = -1
for numworkers in 1...#max workers:
worker_work = [0 for x in range(0, numworkers)]
nonpartial_cost += startup_cost * numworkers
# distribute all nonpartial tasks over workers. Assign tasks to the
# worker with the least amount of work already performed.
for task in all_nonpartial_subqueries:
least_busy_worker = worker_work.smallest()
least_busy_worker += task.total_nonpartial_cost
# the nonpartial cost here is the largest amount any single worker
# has to perform.
nonpartial_cost += worker_work.largest()
total_partial_cost = 0
for task in all_partial_subqueries:
total_partial_cost += task.total_nonpartial_cost
# Compute resources needed by partial tasks. First compute how much
# cost we can distribute to workers that take shorter than the
# "busiest" worker doing non-partial tasks.
remaining_avail_work = 0
for i in range(0, numworkers):
remaining_avail_work += worker_work.largest() - worker_work[i]
# Equally divide up remaining work over all workers
if remaining_avail_work < total_partial_cost:
nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers
# check if this is the best number of workers
if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
best_nonpartial_cost = worker_work.largest
best_nonpartial_nworkers = nworkers
Does that make sense?
> BTW all of the above points apply only for non-partial plans.
Indeed. But I think that's going to be a pretty common type of plan,
especially if we get partitionwise joins.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2017-04-06 02:33:22 | Interval for launching the table sync worker |
Previous Message | Noah Misch | 2017-04-06 02:01:06 | Re: Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f) |