Re: A problem about partitionwise join

From: Richard Guo <riguo(at)pivotal(dot)io>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A problem about partitionwise join
Date: 2019-08-27 07:56:59
Message-ID: CAN_9JTzPTG5qEJ9MerarmuRoifhiejtS3fiRe+ZEZMyz-3NZuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2019 at 8:51 AM Amit Langote <amitlangote09(at)gmail(dot)com>
wrote:

> Hi Richard,
>
> On Mon, Aug 26, 2019 at 6:33 PM Richard Guo <riguo(at)pivotal(dot)io> wrote:
> >
> > Hi All,
> >
> > To generate partitionwise join, we need to make sure there exists an
> > equi-join condition for each pair of partition keys, which is performed
> > by have_partkey_equi_join(). This makes sense and works well.
> >
> > But if, let's say, one certain pair of partition keys (foo.k = bar.k)
> > has formed an equivalence class containing consts, no join clause would
> > be generated for it, since we have already generated 'foo.k = const' and
> > 'bar.k = const' and pushed them into the proper restrictions earlier.
> >
> > This will make partitionwise join fail to be planned if there are
> > multiple partition keys and the pushed-down restrictions 'xxx = const'
> > fail to prune away any partitions.
> >
> > Consider the examples below:
> >
> > create table p (k1 int, k2 int, val int) partition by range(k1,k2);
> > create table p_1 partition of p for values from (1,1) to (10,100);
> > create table p_2 partition of p for values from (10,100) to (20,200);
> >
> > If we are joining on each pair of partition keys, we can generate
> > partitionwise join:
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 =
> bar.k2;
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > Append
> > -> Hash Join
> > Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
> > -> Seq Scan on p_1 foo
> > -> Hash
> > -> Seq Scan on p_1 bar
> > -> Hash Join
> > Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
> > -> Seq Scan on p_2 foo_1
> > -> Hash
> > -> Seq Scan on p_2 bar_1
> > (11 rows)
> >
> > But if we add another qual 'foo.k2 = const', we will be unable to
> > generate partitionwise join any more, because have_partkey_equi_join()
> > thinks not every partition key has an equi-join condition.
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 =
> bar.k2 and foo.k2 = 16;
> > QUERY PLAN
> > -----------------------------------------
> > Hash Join
> > Hash Cond: (foo.k1 = bar.k1)
> > -> Append
> > -> Seq Scan on p_1 foo
> > Filter: (k2 = 16)
> > -> Seq Scan on p_2 foo_1
> > Filter: (k2 = 16)
> > -> Hash
> > -> Append
> > -> Seq Scan on p_1 bar
> > Filter: (k2 = 16)
> > -> Seq Scan on p_2 bar_1
> > Filter: (k2 = 16)
> > (13 rows)
> >
> > Is this a problem?
>
> Perhaps. Maybe it has to do with the way have_partkey_equi_join() has
> been coded. If it was coded such that it figured out on its own that
> the equivalence (foo.k2, bar.k2, ...) does exist, then that would
> allow partitionwise join to occur, which I think would be OK to do.
> But maybe I'm missing something.
>
>
This should be caused by how we deduce join clauses from equivalence
classes. ECs containing consts will not be considered so we cannot
generate (foo.k2 = bar.k2) for the query above.

In addition, when generating join clauses from equivalence classes, we
only select the joinclause with the 'best score', or the first
joinclause with a score of 3. This may make us miss some joinclause on
partition keys.

Check the query below as a more illustrative example:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
partitionwise join:

# explain (costs off)
select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
QUERY PLAN
-----------------------------------------
Append
-> Hash Join
Hash Cond: (foo.k = bar.k)
-> Seq Scan on p_1 foo
-> Hash
-> Seq Scan on p_1 bar
Filter: (k = val)
-> Hash Join
Hash Cond: (foo_1.k = bar_1.k)
-> Seq Scan on p_2 foo_1
-> Hash
-> Seq Scan on p_2 bar_1
Filter: (k = val)
(13 rows)

But if we exchange the order of the two quals to 'foo.k = bar.val and
foo.k = bar.k', then partitionwise join cannot be generated any more,
because we only have joinclause 'foo.k = bar.val' as it first reached
score of 3. We have missed the joinclause on the partition key although
it does exist.

# explain (costs off)
select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
QUERY PLAN
-----------------------------------------
Hash Join
Hash Cond: (foo.k = bar.val)
-> Append
-> Seq Scan on p_1 foo
-> Seq Scan on p_2 foo_1
-> Hash
-> Append
-> Seq Scan on p_1 bar
Filter: (val = k)
-> Seq Scan on p_2 bar_1
Filter: (val = k)
(11 rows)

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message movead.li@highgo.ca 2019-08-27 07:57:20 Re: Re: Email to hackers for test coverage
Previous Message Floris Van Nee 2019-08-27 07:23:18 Re: Optimize single tuple fetch from nbtree index