From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Append implementation |
Date: | 2017-04-07 06:14:39 |
Message-ID: | CAJ3gD9f_1akNqaUDuBO2ev0zuvbYOY44wM6Zep-S2SuG_vc9Kw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6 April 2017 at 07:33, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
>> This is what the earlier versions of my patch had done : just add up
>> per-subplan parallel_workers (1 for non-partial subplan and
>> subpath->parallel_workers for partial subplans) and set this total as
>> the Append parallel_workers.
>
> I don't think that's great, consider e.g. the case that you have one
> very expensive query, and a bunch of cheaper ones. Most of those workers
> wouldn't do much while waiting for the the expensive query. What I'm
> basically thinking we should do is something like the following
> pythonesque pseudocode:
>
> best_nonpartial_cost = -1
> best_nonpartial_nworkers = -1
>
> for numworkers in 1...#max workers:
> worker_work = [0 for x in range(0, numworkers)]
>
> nonpartial_cost += startup_cost * numworkers
>
> # distribute all nonpartial tasks over workers. Assign tasks to the
> # worker with the least amount of work already performed.
> for task in all_nonpartial_subqueries:
> least_busy_worker = worker_work.smallest()
> least_busy_worker += task.total_nonpartial_cost
>
> # the nonpartial cost here is the largest amount any single worker
> # has to perform.
> nonpartial_cost += worker_work.largest()
>
> total_partial_cost = 0
> for task in all_partial_subqueries:
> total_partial_cost += task.total_nonpartial_cost
>
> # Compute resources needed by partial tasks. First compute how much
> # cost we can distribute to workers that take shorter than the
> # "busiest" worker doing non-partial tasks.
> remaining_avail_work = 0
> for i in range(0, numworkers):
> remaining_avail_work += worker_work.largest() - worker_work[i]
>
> # Equally divide up remaining work over all workers
> if remaining_avail_work < total_partial_cost:
> nonpartial_cost += (worker_work.largest - remaining_avail_work) / numworkers
>
> # check if this is the best number of workers
> if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
> best_nonpartial_cost = worker_work.largest
> best_nonpartial_nworkers = nworkers
>
> Does that make sense?
Yeah, I gather what you are trying to achieve is : allocate number of
workers such that the total cost does not exceed the cost of the first
non-partial plan (i.e. the costliest one, because the plans are sorted
by descending cost).
So for non-partial costs such as (20, 10, 5, 2) allocate only 2
workers because the 2nd worker will execute (10, 5, 2) while 1st
worker executes (20).
But for costs such as (4, 4, 4, .... 20 times), the logic would give
us 20 workers because we want to finish the Append in 4 time units;
and this what we want to avoid when we go with
don't-allocate-too-many-workers approach.
>
>
>> BTW all of the above points apply only for non-partial plans.
>
> Indeed. But I think that's going to be a pretty common type of plan,
Yes it is.
> especially if we get partitionwise joins.
About that I am not sure, because we already have support for parallel
joins, so wouldn't the join subpaths corresponding to all of the
partitions be partial paths ? I may be wrong about that.
But if the subplans are foreign scans, then yes all would be
non-partial plans. This may provoke off-topic discussion, but here
instead of assigning so many workers to all these foreign plans and
all those workers waiting for the results, a single asynchronous
execution node (which is still in the making) would be desirable
because it would do the job of all these workers.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2017-04-07 06:32:28 | Duplicate assignment in Unicode/convutils.pm |
Previous Message | Kyotaro HORIGUCHI | 2017-04-07 05:59:45 | shift_sjis_2004 related autority files are remaining |