Re: Removing unneeded self joins

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-07-11 07:43:00
Message-ID: CACJufxF-ubw739W+33_rFSA+YTdRTDcKwVidKwXAS8NCy4nx+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 7/2/24 07:25, jian he wrote:
> > to make sure it's correct, I have added a lot of tests,
> > Some of this may be contrived, maybe some of the tests are redundant.
> Thanks for your job!
> I passed through the patches and have some notes:
> 1. Patch 0001 has not been applied anymore since the previous week's
> changes in the core. Also, there is one place with trailing whitespace.

thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.

also thanks to Alexander Korotkov's tip.
I added a bool change_RangeTblRef to ChangeVarNodes_context.
default is true, so won't influence ChangeVarNodes.
After that create a function ChangeVarNodesExtended.
so now replace_varno replaced by ChangeVarNodes.

now in ChangeVarNodes_walker we've add:
```
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
bool is_req_equal =
(rinfo->required_relids == rinfo->clause_relids);
bool clause_relids_is_multiple =
(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
...
}
```
but this part, we don't have much comments, adding some comments would be good.
but I am not sure how.

static bool
match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
Index relid)
but actually we call it via:
if (!match_unique_clauses(root, inner, uclauses, outer->relid))

I am not sure whether the second argument is "inner" or "outer".
Maybe it will cause confusion.
same with innerrel_is_unique_ext.

/*
* At this stage, joininfo lists of inner and outer can contain
* only clauses, required for a superior outer join that can't
* influence this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
build_joinrel_restrictlist require joinrel, innerrel, outrel, but here
we only have innerrel, outterrel.
so i am confused with the comments.

i add following code snippets after generate_join_implied_equalities
```
if (restrictlist == NIL)
continue
```

I have some confusion with the comments.
/*
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
* subset of join quals. We can use 'force_cache' == true when all
* join quals are self-join quals. Otherwise, we could end up
* putting false negatives in the cache.
*/
if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
outer, JOIN_INNER, selfjoinquals,
list_length(otherjoinquals) == 0,
&uclauses))
continue;

"unique rel cache", not sure the meaning, obviously, "relcache" has a
different meaning.
so i am slightly confused with
"We must bypass the unique rel cache here since we're possibly using a
subset of join quals"

i have refactored comments below
```
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
```.
please check v5-0002 for comments refactor.

Attachment Content-Type Size
v5-0002-refactor-comments-in-remove_self_joins_one_group.patch text/x-patch 1.6 KB
v5-0001-Remove-useless-self-joins.patch text/x-patch 122.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-07-11 08:07:20 CFbot failed on Windows platform
Previous Message Michael Paquier 2024-07-11 07:42:22 Re: Pluggable cumulative statistics