Re: Removing unneeded self joins

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: Removing unneeded self joins
Date: 2025-04-04 08:35:39
Message-ID: 28ab16e7-6ace-40aa-a76d-9a110799944d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/4/25 04:53, Richard Guo wrote:
> On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> I've got an off-list bug report from Alexander Lakhin involving a
>> placeholder variable. Alena and Andrei proposed a fix. It is fairly
>> simple: we just shouldn't remove PHVs during self-join elimination, as
>> they might still be referenced from other parts of a query. The patch
>> is attached. I'm going to fix this if no objections.
>
> Hmm, I'm not sure about the fix. It seems to me that it simply
> prevents removing any PHVs in the self-join removal case. My concern
> is that this might result in PHVs that could actually be removed not
> being removed in many cases.
Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:

toKeep.x+toRemove.y=PHV

This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:

PHV_evaluated_at_inner = PHV_evaluated_at_outer

It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.

Maybe I lost the case you keep in mind? I would like to discover it.

>
> Besides, there's the specific comment above this code explaining the
> logic behind the removal of PHVs. Shouldn't that comment be updated
> to reflect the changes?
It makes sense: for now, it seems that PHV removal should be used in the
case of an outer join removal. In the case of SJE, logically we make a
replacement, not a removal, and we should not reduce the number of
entities involved.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2025-04-04 08:38:39 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Amit Langote 2025-04-04 08:34:13 Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning