Re: Removing unneeded self joins

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-05-02 23:04:40
Message-ID: CAPpHfdsUiwbGLbBwtdL+UCT5GCUVEU+SqQaaZjtcje3znU2JgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Richard!

On Thu, May 2, 2024 at 4:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> > One question for me is: Do we anticipate other lateral self-references
>> > except the TABLESAMPLE case? Looking into the extract_lateral_references
>> > implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
>> > pull up subqueries before extracting lateral references. So, if we have
>> > a reference to a subquery, it means we will not flatten this subquery
>> > and don't execute SJE. Do we need more code, as you have written in the
>> > first patch?
>>
>> I think my first patch was crap anyway. Your explanation seems
>> reasonable to me. I'm not sure this requires any more code. Probably
>> it would be enough to add more comments about this.
>
>
> The tablesample case is not the only factor that can cause a relation to
> have a lateral dependency on itself after self-join removal. It can
> also happen with PHVs. As an example, consider
>
> explain (costs off)
> select * from t t1
> left join lateral
> (select t1.a as t1a, * from t t2) t2
> on true
> where t1.a = t2.a;
> server closed the connection unexpectedly
>
> This is because after self-join removal, a PlaceHolderInfo's ph_lateral
> might contain rels mentioned in ph_eval_at, which we should get rid of.
>
> For the tablesample case, I agree that we should not consider relations
> with TABLESAMPLE clauses as candidates to be removed. Removing such a
> relation could potentially change the syntax of the query, as shown by
> Alexander's example. It seems to me that we can just check that in
> remove_self_joins_recurse, while we're collecting the base relations
> that are considered to be candidates for removal.
>
> This leads to the attached patch. This patch also includes some code
> refactoring for the surrounding code.

Great, thank you for your work on this!

I'd like to split this into separate patches for better granularity of
git history. I also added 0001 patch, which makes first usage of the
SJE acronym in file to come with disambiguation. Also, I've added
assert that ph_lateral and ph_eval_at didn't overlap before the
changes. I think this should help from the potential situation when
the changes we do could mask another bug.

I would appreciate your review of this patchset, and review from Andrei as well.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v2-0002-Minor-refactoring-for-self-join-elimination-code.patch application/octet-stream 3.1 KB
v2-0003-Forbid-self-join-elimination-on-table-sampling-sc.patch application/octet-stream 3.5 KB
v2-0004-Fix-self-join-elimination-work-with-PlaceHolderIn.patch application/octet-stream 3.0 KB
v2-0001-Clarify-the-SJE-self-join-elimination-acronym.patch application/octet-stream 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-05-02 23:19:34 Re: Removing unneeded self joins
Previous Message David Rowley 2024-05-02 22:55:06 Re: enhance the efficiency of migrating particularly large tables