Some problems regarding the self-join elimination code

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Some problems regarding the self-join elimination code
Date: 2025-04-02 13:26:44
Message-ID: CAMbWs49PE3CvnV8vrQ0Dr=HqgZZmX0tdNbzVNJxqc8yg-8kDQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working on another patch, I looked at ChangeVarNodes() and found
some issues introduced by the self-join elimination commit. I think
it'd better to fix or improve them.

* ChangeVarNodes_walker includes special handling for RestrictInfo
nodes. And it adjusts RestrictInfo.orclause only if the rt_index of
the relation to be removed is contained in RestrictInfo.clause_relids.

I don't think this is correct. Even if the to-be-removed relation is
not present in clause_relids, it can still be found somewhere within
the orclause. As an example, consider:

create table t (a int primary key, b int, c int);

explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.c = 1 and (t2.b = t3.b or t2.b = 1);
server closed the connection unexpectedly

This query triggers an Assert failure in match_orclause_to_indexcol,
and the root cause is that the removed relation is still present
in the outer_relids of the OR-clauses.

To fix, I think we may need to run ChangeVarNodes_walker on the
orclause in all cases.

* After the adjustment, the num_base_rels is recalculated as:

+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);

I don't think this is correct either. num_base_rels is the number of
base rels in clause_relids and should not count outer-join relids.
And we have no guarantee that clause_relids does not include any
outer-join relids.

To fix, I think we should exclude all outer-join relids. Perhaps we
can leverage root->outer_join_rels to achieve this.

* I'm not sure why the self-join elimination commit removed all the
comments in ChangeVarNodes(Extended). I think these comments are very
helpful for understanding the code.

- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;

- /*
- * If we are starting at a Query, and sublevels_up is
zero, then we
- * must also fix rangetable indexes in the Query
itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex
and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +772,6 @@ ChangeVarNodes(Node *node, int rt_index, int
new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;

- /* this is unlikely to ever be used, but ... */

* Speaking of comments, I’ve noticed that some changes are lacking
comments. For example, there are almost no comments regarding the
special handling of RestrictInfo nodes in ChangeVarNodes_walker,
except for an explanation about adding the NullTest.

BTW, there's a typo in that explanation: qual() should be equal().

This is not a thorough review of the code; I just randomly looked at
ChangeVarNodes(). It's possible that there are other issues that
haven't been discovered, but I think we should at least fix these.
I'll provide a patch for the fixes later. Apologies for being so
nitpicky.

Thanks
Richard

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-04-02 13:32:50 RE: Fix slot synchronization with two_phase decoding enabled
Previous Message Fujii Masao 2025-04-02 13:11:37 Re: Reducing the log spam