From: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
---|---|
To: | Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Increasing parallel workers at runtime |
Date: | 2017-05-22 11:59:41 |
Message-ID: | CAGz5QCKAif1BZLumQe6eufG=O=7rYWxvFhBNi0ER9vNOMbdG7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 22, 2017 at 2:54 PM, Rafia Sabih
<rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
> On Wed, May 17, 2017 at 2:57 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, May 16, 2017 at 2:14 PM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> On Mon, May 15, 2017 at 9:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>
>>> Also, looking at the patch, it doesn't look like it take enough care
>>> to build execution state of new worker so that it can participate in a
>>> running query. I may be wrong, but the execution state initialization
>>> routines are written with the assumption that all the workers start
>>> simultaneously?
>>>
>>
>> No such assumptions, workers started later can also join the execution
>> of the query.
>>
> If we are talking of run-time allocation of workers I'd like to
> propose an idea to safeguard parallelism from selectivity-estimation
> errors. Start each query (if it qualifies for the use of parallelism)
> with a minimum number of workers (say 2) irrespective of the #planned
> workers. Then as query proceeds and we find that there is more work to
> do, we allocate more workers.
>
> Let's get to the details a little, we'll have following new variables,
> - T_int - a time interval at which we'll periodically check if the
> query requires more workers,
> - work_remaining - a variable which estimates the work yet to do. This
> will use the selectivity estimates to find the total work done and the
> remaining work accordingly. Once, the actual number of rows crosses
> the estimated number of rows, take maximum possible tuples for that
> operator as the new estimate.
>
> Now, we'll check at gather, after each T_int if the work is remaining
> and allocate another 2 (say) workers. This way we'll keep on adding
> the workers in small chunks and not in one go. Thus, saving resources
> in case over-estimation is done.
>
I understand your concern about selectivity estimation error which
affects the number of workers planned as well. But, in that case, I
would like to fix the optimizer so that it calculates the number of
workers correctly. If the optimizer thinks that we should start with n
number of workers, probably we SHOULD start with n number of workers.
However, error in selectivity estimation(The root of all evil, the
Achilles Heel of query optimization, according to Guy Lohman et al.
:)) can always prove the optimizer wrong. In that case, +1 for your
suggested approach of dynamically add or kill some workers based on
the estimated work left to do.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-05-22 12:01:50 | Re: pg_dump ignoring information_schema tables which used in Create Publication. |
Previous Message | tushar | 2017-05-22 11:52:56 | Re: pg_dump ignoring information_schema tables which used in Create Publication. |