Re: Removing unneeded self joins

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-06-17 00:00:00
Message-ID: CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier. I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure. (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)
>

i have looked around the code.
about replace_varno and ChangeVarNodes:

ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;

if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```

---------------------------------------------------------------------
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.

then replace_varno will have problems with DELETE, UPDATE, MERGE
someway.
ChangeVarNodes solved this problem.

> Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> result relation). While I don't claim that that destroyed the entire
> use case for SJE, it certainly knocked its usefulness down by many
> notches, maybe even to the point where it's not worth putting in the
> effort needed to get it to re-committability. So I think we need to
> look harder at finding a way around that. Is the concern that
> RETURNING should return either old or new values depending on which
> RTE is mentioned? If so, maybe the feature Dean has proposed to
> allow RETURNING to access old values [1] is a prerequisite to moving
> forward. Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com

if only SELECT, no worth to make it being committed,
do you think support DML but no support RETURNING worth the effort?

excerpt from [1] latest patch:
+/* Returning behavior for Vars in RETURNING list */
+typedef enum VarReturningType
+{
+ VAR_RETURNING_DEFAULT, /* return OLD for DELETE, else return NEW */
+ VAR_RETURNING_OLD, /* return OLD for DELETE/UPDATE, else NULL */
+ VAR_RETURNING_NEW, /* return NEW for INSERT/UPDATE, else NULL */
+} VarReturningType;
+

typedef struct Var
{
Expr xpr;
@@ -265,6 +278,9 @@ typedef struct Var
*/
Index varlevelsup;

+ /* returning type of this var (see above) */
+ VarReturningType varreturningtype;
--------------------------------------------
example. e.g.
explain(costs off)
WITH t1 AS (SELECT * FROM emp1) UPDATE emp1 SET code = t1.code + 1
FROM t1 WHERE t1.id = emp1.id RETURNING emp1.code, t1.code;

the returning (emp1.code,t1.code) these two var the VarReturningType
is VAR_RETURNING_DEFAULT.
That means the patch (support-returning-old-new-v9.patch in [1]) see
the RETURNING (emp1.code, t1.code) are two different table.column
references.
but here we need to transform it to
"RETURNING new.code, old.code", i think.
that would be way more harder.

>Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.
i will try to explore this area. in this case would be
allow SJE apply to " RETURNING t1.code, t1.code".

I've attached 2 patches, based on the latest patch in this thread.
0001 mainly about replacing all replace_varno to ChangeVarNodes.
0002 makes SJE support for DML without RETURNING clause.

now SJE also works with updatable view. for example:
+CREATE TABLE sj_target (tid integer primary key, balance integer)
WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5,
50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;

+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON
t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)

> [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com

Attachment Content-Type Size
v2-0002-make-SJE-to-apply-DML.patch text/x-patch 22.3 KB
v2-0001-Remove-useless-self-joins.patch text/x-patch 119.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-06-17 00:03:28 Re: Using LibPq in TAP tests via FFI
Previous Message Andres Freund 2024-06-16 23:46:12 ecdh support causes unnecessary roundtrips