Re: Partition-wise join for join between (declaratively) partitioned tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-04-21 22:22:36
Message-ID: CA+TgmoZ_dzramvxH0FEaLzVhPVXvf9fnH9U3rf8OrZ77=EOdoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 21, 2017 at 8:41 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> I don't see how is that fixed. For a join relation we need to come up
> with one set of partition bounds by merging partition bounds of the
> joining relation and in order to understand how to interpret the
> datums in the partition bounds, we need to associate data types. The
> question is which data type we should use if the relations being
> joined have different data types associated with their respective
> partition bounds.
>
> Or are you saying that we don't need to associate data type with
> merged partition bounds? In that case, I don't know how do we compare
> the partition bounds of two relations?

Well, since there is no guarantee that a datatype exists which can be
used to "merge" the partition bounds in the sense that you are
describing, and even if there is one we have no opfamily
infrastructure to find out which one it is, I think it would be smart
to try to set things up so that we don't need to do that. I believe
that's probably possible.

> In your example, A has partition key of type int8, has bound datums
> X1.. X10. B has partition key of type int4 and has bounds datums X1 ..
> X11. C has partition key type int2 and bound datums X1 .. X12.

OK, sure.

> The binary representation of X's is going to differ between A, B and C
> although each Xk for A, B and C is equal, wherever exists.

Agreed.

> Join
> between A and B will have merged bound datums X1 .. X10 (and X11
> depending upon the join type). In order to match bounds of AB with C,
> we need to know the data type of bounds of AB, so that we can choose
> appropriate equality operator. The question is what should we choose
> as data type of partition bounds of AB, int8 or int4. This is
> different from applying join conditions between AB and C, which can
> choose the right opfamily operator based on the join conditions.

Well, the join is actually being performed either on A.keycol =
C.keycol or on B.keycol = C.keycol, right? It has to be one or the
other; there's no "merged" join column in any relation's targetlist,
but only columns derived from the various baserels. So let's use that
set of bounds for the matching. It makes sense to use the set of
bounds for the matching that corresponds to the column actually being
joined, I think.

It's late here and I'm tired, but it seems like it should be possible
to relate the child joinrels of the AB join back to the child joinrels
of either A or B. (AB)1 .. (AB)10 related back to A1 .. A10 and B1 ..
B10. (AB)11 relates back to B11 but, of course not to A11, which
doesn't exist. If the join is INNER, (AB)11 is a dummy rel anyway and
actually we should probably see whether we can omit it altogether. If
the join is an outer join of some kind, there's an interesting case
where the user wrote A LEFT JOIN B or B RIGHT JOIN A so that A is not
on the nullable side of the join; in that case, too, (AB)11 is dummy
or nonexistent. Otherwise, assuming A is nullable, (AB)11 maps only
to B11 and not to A11. But that's absolutely right: if the join to C
uses A.keycol, either the join operator is strict and (AB)11 won't
match anything anyway, or it's not and partition-wise join is illegal
because A.keycol in (AB)11 can include not only values from X11 but
also nulls.

So, it seems to me that what you can do is loop over the childrels on
the outer side of the join. For each one, you've got a join clause
that relates the outer rel to the inner rel, and that join clause
mentions some baserel which is contained in the joinrel. So drill
down through the childrel to the corresponding partition of the
baserel and get those bounds. Then if you do the same thing for the
inner childrels, you've now got two lists of bounds, and the type on
the left matches the outer side of the join and the type on the right
matches the inner side of the join and the opfamily of the operator in
the join clause gives you a comparison operator that relates those two
types, and now you can match them up.

(We should also keep in mind the case where there are multiple columns
in the partition key.)

> This seems to suggest that we can not come up with merged bounds for
> join if the partition key types of joining relations differ.

Yes, I think that would be difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-04-21 22:23:59 Re: Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?
Previous Message Michael Paquier 2017-04-21 22:20:54 Re: scram and \password