Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-15 12:49:25
Message-ID: CAFjFpRdkfsRzdAuki1Z6WjoZvDVKq1VjXJFYRFWiDz8qASV3tA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>
>> There are some differences in what geqo does and what partition-wise
>> needs to do. geqo tries many joining orders each one in a separate
>> temporary context. The way geqo slices the work, every slice produces
>> a full plan. For partition-wise join I do not see a way to slice the
>> work such that the whole path and corresponding RelOptInfos come from
>> the same slice. So, we can't use the same method as GEQO.
>
> What I was thinking about was the use of this technique for getting
> rid of joinrels:
>
> root->join_rel_list = list_truncate(root->join_rel_list,
> savelength);
> root->join_rel_hash = savehash;
>
> makePathNode() serves to segregate paths into a separate memory
> context that can then be destroyed, but as you point out, the path
> lists are still hanging around, and so are the RelOptInfo nodes. It
> seems to me we could do a lot better using this technique. Suppose we
> jigger things so that the List objects created by add_path go into
> path_cxt, and so that RelOptInfo nodes also go into path_cxt. Then
> when we blow up path_cxt we won't have dangling pointers in the
> RelOptInfo objects any more because the RelOptInfos themselves will be
> gone. The only problem is that the join_rel_list (and join_rel_hash
> if it exists) will be corrupt, but we can fix that using the technique
> demonstrated above.
>
> Of course, that supposes that 0009 can manage to postpone creating
> non-sampled child joinrels until create_partition_join_plan(), which
> it currently doesn't.

Right. We need the child-join's RelOptInfos to estimate sizes, so that
we could sample the largest ones. So postponing it looks difficult.

> In fact, unless I'm missing something, 0009
> hasn't been even slightly adapted to take advantage of the
> infrastructure in 0001; it doesn't seem to reset the path_cxt or
> anything. That seems like a fairly major omission.

The path_cxt reset introduced by 0001 recycles memory used by all the
paths, including paths created for the children. But that happens only
after all the planning has completed. I thought that's what we
discussed to be done. We could create a separate path context for
every top-level child-join. That will require either copying the
cheapest path-tree into root->glob->path_cxt memory context OR will
require it to be converted to a plan immediately. The first will
require spending CPU cycles and memory in copying path-tree. The later
requires almost all the create_*_append_plan() code to be duplicated
in create_partition_join_plan() which is ugly. In an earlier version
of this patch I had that code, which I got rid of in the latest set of
patches. Between those two the first looks better.

>
> Incidentally, I committed 0002, 0003, and 0005 as a single commit with
> a few tweaks; I think you may need to do a bit of rebasing.

Thanks. I will have fewer patches to rebase now :).

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-15 12:55:22 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Amit Kapila 2017-03-15 12:35:49 Re: Microvacuum support for Hash Index