Re: BUG #18429: Inconsistent results on similar queries with join lateral

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Benoit Ryder <b(dot)ryder(at)ateme(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18429: Inconsistent results on similar queries with join lateral
Date: 2024-04-14 16:44:37
Message-ID: 508653.1713113077@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> I wondered that we could fix this issue by checking em_nullable_relids
> in generate_join_implied_equalities_normal when we determine if an EC
> member is computable at the join. Then I realize that this is not easy
> to achieve without knowing the exact join(s) where the nulling would
> happen, which is exactly what the nullingrel stuff introduced in v16
> does. So your proposed fix seems the right way to go.

Yeah. The reason these queries work correctly in v16+ is that
get_baserel_parampathinfo calls generate_join_implied_equalities
with joinrelids that don't include the outer join's relid, so
the latter won't produce any join clauses that need to be evaluated
above the outer join. But later, when build_joinrel_restrictlist
calls generate_join_implied_equalities, it does include the outer
join's relid, so we correctly produce and enforce the clause as a
filter clause at the outer join's plan level.

However, pre-v16, those two calls pass the exact same parameters
and necessarily get the exact same clauses. We could only fix that
with an API change for generate_join_implied_equalities, which seems
dangerous in stable branches. So I think filtering the clause list
in get_baserel_parampathinfo is the right direction for a solution
there. We can make it cleaner than in my WIP patch though...

> Now that we've learned that join_clause_is_movable_into's heuristic
> about physically referencing the target rel can fail for EC-derived
> clauses, I'm kind of concerned that we may end up with duplicate clauses
> in the final plan, since we do not check EC-derived clauses against
> join_clause_is_movable_into in get_baserel_parampathinfo while we do in
> create_nestloop_path. What if we have an EC-derived clause that in
> get_baserel_parampathinfo it is put into ppi_clauses while in
> create_nestloop_path it does not pass the movability checking? Is it
> possible to occur, or is it just my illusion?

I'm not sure either, but it certainly seems like a hazard. Also,
get_joinrel_parampathinfo is really depending on getting consistent
results from join_clause_is_movable_into to assign clauses to the
correct join level. So after sleeping on it I think that "the
results of generate_join_implied_equalities should pass
join_clause_is_movable_into" is an invariant we don't really want to
let go of, meaning that it would be a good idea to fix equivclass.c
to ensure that's true in these child-rel corner cases. That's not
very hard, requiring just a small hack in create_join_clause, as
attached. It's not that much of a hack either because there are
other places in equivclass.c that force the relid sets for child
expressions to be more than what pull_varnos would conclude (search
for comments mentioning pull_varnos).

Having done that, we can add code in HEAD/v16 to assert that
join_clause_is_movable_into is true here, while in the older branches
we can use it to filter rather than klugily checking nullable_relids
directly. So I end with the attached two patches.

I didn't include the new test case in the HEAD/v16 patch; since it
doesn't represent a live bug for those branches I felt like maybe
it's not worth the test cycles going forward. But there's certainly
room for the opposite opinion. What do you think?

regards, tom lane

Attachment Content-Type Size
v2-fix-bug-18429-head.patch text/x-diff 2.6 KB
v2-fix-bug-18429-v15.patch text/x-diff 5.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2024-04-15 03:42:10 Re: Re: BUG #18305: Unexpected error: "WindowFunc not found in subplan target lists" triggered by subqueries
Previous Message PG Bug reporting form 2024-04-14 16:08:07 BUG #18433: Logical replication timeout