Re: Enabling parallelism for queries coming from SQL or other PL functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling parallelism for queries coming from SQL or other PL functions
Date: 2017-02-26 10:44:31
Message-ID: CA+TgmoZFmDrhSbbbpBb3AEFA3Zwj9izmXrHi9OqP-83J3GEjLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> Sure, but that should only happen if the function is *not* declared as
>>> parallel safe (aka in parallel safe functions, we should not generate
>>> parallel plans).
>>
>> So basically we want to put a restriction that parallel-safe function
>> can not use the parallel query? This will work but it seems too
>> restrictive to me. Because by marking function parallel safe we enable
>> it to be used with the outer parallel query that is fine. But, that
>> should not restrict the function from using the parallel query if it's
>> used with the other outer query which is not having the parallel
>> plan(or function is executed directly).
>
> I think if the user is explicitly marking a function as parallel-safe,
> then it doesn't make much sense to allow parallel query in such
> functions as it won't be feasible for the planner (or at least it will
> be quite expensive) to detect the same. By the way, if the user has
> any such expectation from a function, then he can mark the function as
> parallel-restricted or parallel-unsafe.

However, if a query is parallel-safe, it might not end up getting run
in parallel. In that case, it could still benefit from parallelism
internally. I think we want to allow that. For example, suppose you
run a query like:

SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;

If sometab isn't very big, it's probably better to use a non-parallel
plan for this query, because then somewhat_expensive_function() can
still use parallelism internally, which might be better. However, if
sometab is large enough, then it might be better to parallelize the
whole query using a Partial/FinalizeAggregate and force each call to
somewhat_expensive_function() to run serially. So I don't think a
hard-and-fast rule that parallel-safe functions can't use parallelism
internally is a good idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2017-02-26 10:59:15 Re: Allow pg_dumpall to work without pg_authid
Previous Message Robert Haas 2017-02-26 10:37:25 Re: Allow pg_dumpall to work without pg_authid