Re: paths in partitions of a dummy partitioned table

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: paths in partitions of a dummy partitioned table
Date: 2017-09-06 07:08:24
Message-ID: CAFjFpRcx4BzpzmLHha-PeoZ6csYmyTSRMW5Wc6nvfD=f9=yq6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2017 at 10:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
>> partition relations dummy and thus doesn't set any (dummy) paths in the
>> partition relations. The lack of paths in the partitions means that we can
>> not use partition-wise join while joining this table with some other similarly
>> partitioned table as the partitions do not have any paths that child-joins can
>> use. This means that we can not create partition relations for such a join and
>> thus can not consider the join to be partitioned. This doesn't matter much when
>> the dummy relation is the outer relation, since the resultant join is also
>> dummy. But when the dummy relation is inner relation, the resultant join is not
>> dummy and can be considered to be partitioned with same partitioning scheme as
>> the outer relation to be joined with other similarly partitioned table. Not
>> having paths in the partitions deprives us of this future optimization.
>
> I think it's wrong for any code to be examining the path list for a
> rel marked dummy, so I would suggest approaching this from a different
> direction.

Me and Robert had an offline discussion about this. I am summarizing
it here for the sake of completeness.

A dummy relation is identified by the only dummy path that exists in
its pathlist. There is no flag in RelOptInfo which tells whether a
given relation is dummy or not, it's the dummy path which tells that.
A dummy path is an Append path with no subpaths. Planner doesn't treat
dummy relations any different from other relations when it comes to
using paths. When a dummy relation participates in a join, the dummy
path is used as one of the joining paths and converted to a Result
plan at the time of planning. So, for a partition-wise join where one
of the joining relations is dummy, its every child must have dummy
path which can be used to construct child-join paths.

But we don't need to mark partition relations dummy (if their parent
is dummy) even when it's not going to participate in partition-wise
join. The partition relations will be marked dummy when we know that
they will be required for partition-wise join. I was worried that we
might mark base relation dummy during join planning this way, but we
already have a precedence for that in add_paths_to_join_rel(). So,
shouldn't be a problem. So, I have now added a patch in partition-wise
join set to mark partition relations dummy when their parent is dummy.

> Given A LEFT JOIN B where Bk is dummy, I suggest
> constructing the path for (AB)k by taking a path from Ak and applying
> an appropriate PathTarget. You don't really need a join path at all;
> a path for the non-dummy input is fine - and, in fact, better, since
> it will be cheaper to execute. One problem is that it may not produce
> the correct output columns. (AB) may drop some columns that were
> being generated by A because they were only needed to perform the
> join, and it may add additional columns taken from B. But I think
> that if you take the default PathTarget for (AB) and replace
> references to columns of B with NULL constants, you should be able to
> apply the resulting PathTarget to a path for Ak and get a valid path
> for (AB)k. Maybe there is some reason why that won't work exactly,
> but I think it is probably more promising to attack the problem from
> this angle than to do what you propose. Sticking dummy joins into the
> query plan that are really just projecting out NULLs is not appealing.
>

This might help in the cases when the RelOptInfo itself is missing
e.g. missing partitions in partition matching as discussed in [1]. I
will discuss this approach on that thread.

[1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-06 07:11:55 Re: CLUSTER command progress monitor
Previous Message Andres Freund 2017-09-06 07:03:40 Changing Jobs