Re: why not parallel seq scan for slow functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-09-06 19:53:03
Message-ID: CA+Tgmoa_8GCOAsyEiS5Y1eMahMFBifLzX-yXKMjiZP-Dt8jFJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> In particular, as Jeff and Amit point out, it
>> may well be that (a) before apply_projection_to_path(), the cheapest
>> plan is non-parallel and (b) after apply_projection_to_path(), the
>> cheapest plan would be a Gather plan, except that it's too late
>> because we've already thrown that path out.
>
> I'm not entirely following. I thought that add_path was set up to treat
> "can be parallelized" as an independent dimension of merit, so that
> parallel paths would always survive.

It treats parallel-safety as an independent dimension of merit; a
parallel-safe plan is more meritorious than one of equal cost which is
not. We need that so that because, for example, forming a partial
path for a join means joining a partial path to a parallel-safe path.
But that doesn't help us here; that's to make sure we can build the
necessary stuff *below* the Gather. IOW, if we threw away
parallel-safe paths because there was a cheaper parallel-restricted
path, we might be unable to build a partial path for the join *at
all*.

Here, the Gather path is not parallel-safe, but rather
parallel-restricted: it's OK for it to exist in a plan that uses
parallelism (duh), but it can't be nested under another Gather (also
duh, kinda). So before accounting for the differing projection cost,
the Gather path is doubly inferior: it is more expensive AND not
parallel-safe, whereas the competing non-parallel plan is both cheaper
AND parallel-safe. After applying the expensive target list, the
parallel-safe plan gets a lot more expensive, but the Gather path gets
more expensive to a lesser degree because the projection step ends up
below the Gather and thus happens in parallel, so now the Gather plan,
still a loser on parallel-safety, is a winner on total cost and thus
ought to have been retained and, in fact, ought to have won. Instead,
we threw it out too early.

>> What we ought to do, I think, is avoid generating gather paths until
>> after we've applied the target list (and the associated costing
>> changes) to both the regular path list and the partial path list.
>
> Might be a tad messy to rearrange things that way.

Why do you think I wanted you to do it? :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-06 19:54:20 Re: Fix performance of generic atomics
Previous Message Tom Lane 2017-09-06 19:41:42 Re: why not parallel seq scan for slow functions