Re: Removing unneeded self joins

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-07-04 02:14:00
Message-ID: CACJufxFOuWQecXxWdp1RQnhoznqaDYo7jBRLOHmr7HG6hbugyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 3, 2024 at 11:39 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Mon, Jun 17, 2024 at 3:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >
> > > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > > I want to go on record right now as disagreeing with the plan proposed
> > > > in the commit message for the revert commit, namely, committing this
> > > > again early in the v18 cycle. I don't think Tom would have proposed
> > > > reverting this feature unless he believed that it had more serious
> > > > problems than could be easily fixed in a short period of time. I think
> > > > that concern is well-founded, given the number of fixes that were
> > > > committed. It seems likely that the patch needs significant rework and
> > > > stabilization before it gets committed again, and I think it shouldn't
> > > > be committed again without explicit agreement from Tom or one of the
> > > > other committers who have significant experience with the query
> > > > planner.
> > >
> > > FWIW I accept some of the blame here, for not having paid any
> > > attention to the SJE work earlier. I had other things on my mind
> > > for most of last year, and not enough bandwidth to help.
> > >
> > > The main thing I'd like to understand before we try this again is
> > > why SJE needed so much new query-tree-manipulation infrastructure.
> > > I would have expected it to be very similar to the left-join
> > > elimination we do already, and therefore to mostly just share the
> > > existing infrastructure. (I also harbor suspicions that some of
> > > the new code existed just because someone didn't research what
> > > was already there --- for instance, the now-removed replace_varno
> > > sure looks like ChangeVarNodes should have been used instead.)
> > >
> >
> > i have looked around the code.
> > about replace_varno and ChangeVarNodes:
> >
> > ChangeVarNodes
> > have
> > ````
> > if (IsA(node, RangeTblRef))
> > {
> > RangeTblRef *rtr = (RangeTblRef *) node;
> >
> > if (context->sublevels_up == 0 &&
> > rtr->rtindex == context->rt_index)
> > rtr->rtindex = context->new_index;
> > /* the subquery itself is visited separately */
> > return false;
> > }
> > ````
> > if ChangeVarNodes executed the above code in remove_useless_self_joins and
> > remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
> > to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
> > *nremoved will be zero.
> > then the below code error branch will be executed.
> > ````
> > joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
> > if (nremoved != 1)
> > elog(ERROR, "failed to find relation %d in joinlist", relid);
> > ```
>
> Did you manage to overcome this problem in your patch? If not, why do
> regression tests pass while this seems to affect pretty much every
> self-join removal? If so, how did you do that?
>

in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.

Andrei Lepikhov:
+ /* Replace varno in all the query structures */
+ replace_varno((Node *) root->parse, toRemove->relid, toKeep->relid);
So Andrei Lepikhov's change didn't touch joinlist,
Query->resultRelation, Query->mergeTargetRelation.

Then in v3-0002 I tried to make SJE work with UPDATE, i thought it worked well,
because ChangeVarNodes also takes care of Query->resultRelation,
Query->mergeTargetRelation.
then later your EPQ demenonsate shows that's not enough.

so, in summary, in v3-0001, by changing all replace_varno to ChangeVarNodes
paves ways to make SJE apply to UPDATE/DELETE/MERGE.
It's just that we need to reverse some effects of ChangeVarNodes.
(restore_rangetblref)

> > > Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> > > result relation). While I don't claim that that destroyed the entire
> > > use case for SJE, it certainly knocked its usefulness down by many
> > > notches, maybe even to the point where it's not worth putting in the
> > > effort needed to get it to re-committability. So I think we need to
> > > look harder at finding a way around that. Is the concern that
> > > RETURNING should return either old or new values depending on which
> > > RTE is mentioned? If so, maybe the feature Dean has proposed to
> > > allow RETURNING to access old values [1] is a prerequisite to moving
> > > forward. Alternatively, perhaps it'd be good enough to forbid SJE
> > > only when the non-target relation is actually mentioned in RETURNING.
> > >
>
> It appears you didn't try to address the EPQ problem, which seems to
> me even more serious than the RETURNING problem.
>
> See the following example.
>
> Session 1
> # create table test (id int primary key, val int);
> # insert into test values (1,1);
> # begin;
> # update test set val = val + 1 where id = 1;
>
> Session 2
> # update test set val = t.val + 1 from test t where test.id = t.id;
> (wait)
>
> Session 1
> # commit;
>
current mechanism, in this example context,
SJE can translate ```update test set val = t.val + 1 from test t where
test.id = t.id;``` as good as to
```update test set val = val + 1```.
if we replace it that way, then this example would result val = 3.

but without SJE,
```update test set val = t.val + 1 from test t where test.id = t.id;```
will result val = 2.

you mentioned the EPQ problem, previously i don't know what that means.
now i see, I feel like it is quite challenging to resolve it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-07-04 02:55:59 Re: Cannot find a working 64-bit integer type on Illumos
Previous Message David Rowley 2024-07-04 01:29:39 Re: On disable_cost