Re: Removing unneeded self joins

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-10-28 21:18:27
Message-ID: CAPpHfds=AapoZR23bU4YzT2Of_hOCr8ecPdOX6MOCK=TrAKMkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 8, 2024 at 6:54 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 7/24/24 18:07, Dean Rasheed wrote:
> > So you have to remember that "inner" is "outer" and "outer" is "inner"
> > when going into generate_join_implied_equalities() from
> > remove_self_joins_one_group(). And the same thing happens when calling
> > innerrel_is_unique_ext() and match_unique_clauses(). I think all that
> > could be resolved by swapping "inner" and "outer" in the variable
> > names and comments in remove_self_joins_one_group().
> As you can see, when you dive into the implementation of the
> generate_join_implied_equalities, an inner join works symmetrically. We
> don't need any other clauses here, except mentioning both inner and
> outer to prove self-join. So, it doesn't matter which side to choose as
> an inner - all the effect can be: instead of x1=x2, we will have clause
> x2=x1.

I think switching places of inner and outer still makes sense in terms
of readability. I've done so for v10. Also, added a comment
regarding symmetry of arguments for generate_join_implied_equalities()
call.

> > This presumes that any caller interested in knowing the extra
> > baserestrictinfo clauses used to prove uniqueness must be looking at a
> > self join. That may be true today, but it doesn't seem like a good API
> > design choice. I think it would be better to just add "self_join" as
> > an extra parameter, and also maybe have the function return the
> > UniqueRelInfo containing the "extra_clauses", or NULL if it's not
> > unique. That way, it would be more extensible, if we wanted it to
> > return more information in the future.
> If more users are interested in this function, we can just rename the
> flag self_join to something more universal - extra_prove, I'm not sure.
> >
> > Instead of adding relation_has_unique_index_ext(), maybe it would be
> > OK to just change the signature of relation_has_unique_index_for(). It
> > looks like it's only called from one other place outside
> > analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
> I vote against it remembering of extensions and broken APIs.

I spend some time using github search to find usage of
innerrel_is_unique() and relation_has_unique_index() in extensions.
Didn't find extensions actually use them. Some forks are using them
in the way PostgreSQL doesn't. But for forks it's way easier to adopt
such trivial API changes. If even there are some extensions using
this functions, it wouldn't be a terrible case of the broken API. The
current way seems to be rather bloating of our API. This is why I
decided to remove innerrel_is_unique_ext() and
relation_has_unique_index_ext(), and just add arguments to
innerrel_is_unique() and relation_has_unique_index_for().

I also did some rephrasing and grammar fixes for comments.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v10-0001-Remove-useless-self-joins.patch application/octet-stream 123.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-10-28 21:24:02 Re: sunsetting md5 password support
Previous Message Michael Banck 2024-10-28 21:10:58 Re: trying again to get incremental backup