Re: Partitionwise join fails under GEQO

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitionwise join fails under GEQO
Date: 2020-10-06 11:42:45
Message-ID: CAExHW5u1+8r7Rbvm+qemm2xsvbEYkyJOs=v0gfhUcoRor7eBpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 5, 2020 at 11:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> writes:
> > On Mon, Oct 5, 2020 at 6:47 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Now that I've seen this, I wonder whether add_child_join_rel_equivalences
> >> might not be making duplicate EC entries even without GEQO. Is there any
> >> guarantee that it's not called repeatedly on related join-rel sets?
>
> > build_child_join_rel() is the only caller of
> > add_child_join_rel_equivalences(). Before the first calls the later,
> > the first has an Assert
> > 899 Assert(!find_join_rel(root, joinrel->relids));
> > This means that for a given child join rel this function is called
> > only once implying that add_child_join_rel_equivalences() is called
> > only once for a given child join rel. Initially I thought that this is
> > enough to not add duplicate child members. But to me use of
> > bms_overlap() in that function looks doubtful.
>
> Yeah. As soon as you get into three-or-more-way joins, this code will
> be called again, mentioning some of the same relids as at the lower
> join level.
>
> > On a related but different subject, I have been thinking on-off about
> > the need for expression translation while planning partitions.The
> > translated expressions differ only in the leaf-vars and even for most
> > of the partitioned tables really only the varno (assuming that
> > partitioned table and partitions will have same layouts when a large
> > number of partitions are created automatically.) If we can introduce a
> > new type of (polymorphic) Var node which represents not just the
> > parent Var but also child Vars as well. The whole need for expression
> > translation vanishes, reducing memory requirements by many folds.
>
> Actually, the reason I have been looking at equivclass.c is that I've
> been attacking the problem from a different direction. I'm fairly
> unexcited about making the definition of Var looser as you suggest
> here --- I actually think that it's dangerously imprecise already,
> due to not accounting for the effects of outer joins. However,
> we could avoid the growth in eclass members for large partition sets
> if we simply didn't store child eclass members, instead translating
> on-the-fly when we need to deal with child rels. I have a patch
> about half done, but it won't be possible to determine the true
> performance implications of that idea until it's all done. More
> later.

If we translate more than ones, every time someone comes searching for
an EC member, we will leak memory in the planner memory context, which
anyway gets bloated because of the huge number of translations even
when done only once. So that needs to be done rather carefully. Of
course we could save child EC members separately in the same EC and
search that using hash or something to avoid multiple instances of the
same child EC member.

But this memory bloat isn't unique to ECs, though it shows up mostly
in ECs. We translate all the expressions, TLEs, quals and so on. That
consumes a lot of memory. The number of joins grows exponentially with
the number of relations being joined and so does this child joins. So
we need some way to avoid expression translations, where the
translated expressions just differ in leaf var nodes, in order to
support a large number of partitions.

>
> Either approach would mean that add_child_join_rel_equivalences
> goes away entirely, or at least doesn't need to store new em_is_child
> entries anymore, causing the memory bloat issue to become moot.
> So maybe we should just fix the wrong-context issue for now, and
> live with the GEQO bloat in the back branches.

Yes, I agree with that. For now your patch fixing the wrong context
issue is good enough.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-10-06 11:45:49 Re: enable_incremental_sort changes query behavior
Previous Message Pavel Borisov 2020-10-06 11:05:42 Re: Yet another fast GiST build