Re: Parallel Append can break run-time partition pruning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, amitdkhan(dot)pg(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Append can break run-time partition pruning
Date: 2020-04-21 03:03:18
Message-ID: CAApHDvq5BLXmXLFxDbmYUagmtYfVNFc-YSkVZbhuXOcM+--+Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 17 Apr 2020 at 19:08, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Wed, Apr 15, 2020 at 4:18 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I'm a bit divided on what the correct fix is. If I blame Parallel
> > Append for not trying hard enough to pull up the lower Append in
> > accumulate_append_subpath(), then clearly the parallel append code is
> > to blame.
>
> I spent some time trying to understand how Append parallelism works
> and I am tempted to agree with you that there might be problems with
> how accumulate_append_subpath()'s interacts with parallelism. Maybe it
> would be better to disregard a non-parallel-aware partial Append if it
> requires us to fail on flattening a child Append. I have as attached
> a PoC fix to show that. While a nested Append is not really a problem
> in general, it appears to me that our run-time code is not in position
> to work correctly with them, or at least not with how things stand
> today...

Thanks for taking a look at this. I've now looked at this in more
detail and below is my understanding of what's going on:

It seems, in this case, what's going on is, on the following line:

accumulate_append_subpath(cheapest_partial_path,
&partial_subpaths, NULL);

we don't manage to pullup the sub-Append due to passing a NULL pointer
for the final special_subpaths argument. This results in just taking
the child's Append path verbatim. i.e. nested Append

Later, when we do:

else if (nppath == NULL ||
(cheapest_partial_path != NULL &&
cheapest_partial_path->total_cost < nppath->total_cost))
{
/* Partial path is cheaper or the only option. */
Assert(cheapest_partial_path != NULL);
accumulate_append_subpath(cheapest_partial_path,
&pa_partial_subpaths,
&pa_nonpartial_subpaths);

we do pass a non-NULL special_subpaths argument to allow the
sub-Append to be pulled up.

So, now we have 2 paths, one with a nested Append and one with a
flattened Append. Both paths have the same cost, but due to the fact
that we call add_partial_path() for the nested Append version first,
the logic in add_partial_path() accepts that path. However, the
subsequent call of add_partial_path(), the one for the non-nested
Append, that path is rejected due to the total cost being too similar
to one of the existing partial path. We just end up keeping the nested
Append as the cheapest partial path... That path, since in the example
case only has a single subpath, is pulled up into the main append by
the logic added in 8edd0e794.

I think you've realised this and that's why your PoC patch just
rejected the first path when it's unable to do the pullup. We'll get a
better path later when we allow mixed partial and non-partial paths.

(We'll never fail to do a pullup when calling
accumulate_append_subpath() for "nppath", since that's a non-parallel
path and accumulate_append_subpath() will always pull Append paths up
when they're not parallel aware.)

I wonder if the fix should be more something along the lines of trying
to merge things do we only generate a single partial path. That way
we wouldn't be at the mercy of the logic in add_partial_path() to
accept or reject the path based on the order the paths are added.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-21 03:09:25 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Bruce Momjian 2020-04-21 02:35:15 Re: Do we need to handle orphaned prepared transactions in the server?