Re: Removing unneeded self joins

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-10-08 03:54:02
Message-ID: 1b69a5a4-674e-4bd2-b711-e15ed593ec7d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/24/24 18:07, Dean Rasheed wrote:
> On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>>
>>> The new function replace_relid() looks to be the same as adjust_relid_set().
>>
>> They are similar, not the same. replace_relid() has handling for
>> negative newId, while adjust_relid_set() hasn't. One thing I'd like
>> to borrow from adjust_relid_set() to replace_relid() is the usage of
>> IS_SPECIAL_VARNO() macro.
>
> Ah, that makes sense. In that case, I'd say that replace_relid()
> should go in analyzejoins.c (and be a local function there), since
> that's the only place that requires this special negative newId
> handling.
Done. See attachment.

> So you have to remember that "inner" is "outer" and "outer" is "inner"
> when going into generate_join_implied_equalities() from
> remove_self_joins_one_group(). And the same thing happens when calling
> innerrel_is_unique_ext() and match_unique_clauses(). I think all that
> could be resolved by swapping "inner" and "outer" in the variable
> names and comments in remove_self_joins_one_group().
As you can see, when you dive into the implementation of the
generate_join_implied_equalities, an inner join works symmetrically. We
don't need any other clauses here, except mentioning both inner and
outer to prove self-join. So, it doesn't matter which side to choose as
an inner - all the effect can be: instead of x1=x2, we will have clause
x2=x1.

> selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
>
> That appears to be pointless, because is_innerrel_unique_for() will
> filter the restrictlist it is given, removing those baserestrictinfo
> clauses (because I think they'll always have can_join = false). And
> then relation_has_unique_index_ext() will re-add them:
Yeah, we used to have it to work out the case, then restrictlist == NIL.
I agree, it may be a bit strange and too expensive. Let's add a flag
check inside the innerrel_is_unique_ext instead.

> This presumes that any caller interested in knowing the extra
> baserestrictinfo clauses used to prove uniqueness must be looking at a
> self join. That may be true today, but it doesn't seem like a good API
> design choice. I think it would be better to just add "self_join" as
> an extra parameter, and also maybe have the function return the
> UniqueRelInfo containing the "extra_clauses", or NULL if it's not
> unique. That way, it would be more extensible, if we wanted it to
> return more information in the future.
If more users are interested in this function, we can just rename the
flag self_join to something more universal - extra_prove, I'm not sure.
>
> Instead of adding relation_has_unique_index_ext(), maybe it would be
> OK to just change the signature of relation_has_unique_index_for(). It
> looks like it's only called from one other place outside
> analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
I vote against it remembering of extensions and broken APIs.
>
> Should match_unique_clauses() be comparing mergeopfamilies or opnos to
> ensure that the clauses are using the same equality operator?

Having the same variable side, same constant, unique index? I can
imagine only a self-made algebra, which can break something here, but it
looks like over-engineering, right?

Attached version is rebased onto the new master. Because of a3179ab it
may be necessary to do more detailed review.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
v8-0001-Remove-useless-self-joins.patch text/x-patch 122.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-08 04:19:52 Re: Function for listing pg_wal/summaries directory
Previous Message Fujii Masao 2024-10-08 03:41:16 Re: Function for listing pg_wal/summaries directory