Re: A new strategy for pull-up correlated ANY_SUBLINK

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: A new strategy for pull-up correlated ANY_SUBLINK
Date: 2024-07-03 08:33:44
Message-ID: 780533de-c9af-4823-94bf-dbbdb03c1bfa@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/1/24 16:17, Andrei Lepikhov wrote:
> On 10/12/23 14:52, Andy Fan wrote:
>> Here the sublink can't be pulled up because of its reference to
>> the  LHS of left join, the original logic is that no matter the 'b.t
>> in ..'
>> returns the true or false,  the rows in LHS will be returned.  If we
>> pull it up to LHS, some rows in LHS will be filtered out, which
>> breaks its original semantics.
> Hi,
> I spent some time trying to understand your sentence.
> I mean the following case:
>
> SELECT * FROM t1 LEFT JOIN t2
>   ON t2.x IN (SELECT y FROM t3 WHERE t1.x=t3.x);
>
> I read [1,2,3], but I am still unsure why it is impossible in the case
> of OUTER JOIN. By setting the LATERAL clause, we forbid any clauses from
> the RTE subquery to bubble up as a top-level clause and filter tuples
> from LHS, am I wrong? Does it need more research or you can show some
> case to support your opinion - why this type of transformation must be
> disallowed?
>
> [1] https://www.postgresql.org/message-id/6531.1218473967%40sss.pgh.pa.us
> [2]
> https://www.postgresql.org/message-id/BANLkTikGFtGnAaXVh5%3DntRdN%2B4w%2Br%3DNPuw%40mail.gmail.com
> [3] https://www.vldb.org/conf/1992/P091.PDF
>

I delved into it a bit more. After reading [4,5] I invented query that
is analogue of the query above, but with manually pulled-up sublink:

EXPLAIN (COSTS OFF)
SELECT * FROM t1 LEFT JOIN t2 JOIN LATERAL
(SELECT t1.x AS x1, y,x FROM t3) q1 ON (t2.x=q1.y AND q1.x1=q1.x) ON true;

And you can see the plan:

Nested Loop Left Join
-> Seq Scan on t1
-> Hash Join
Hash Cond: (t2.x = t3.y)
-> Seq Scan on t2
-> Hash
-> Seq Scan on t3
Filter: (t1.x = x)

Just for fun, I played with MSSQL Server and if I read its explain
correctly, it also allows pulls-up sublink which mentions LHS:

-------------------------------------
Nested Loops(Left Outer Join, OUTER REFERENCES:(t1.x))
Table Scan(OBJECT:(t1))
Hash Match(Right Semi Join, HASH:(t3.y)=(t2.x),
RESIDUAL:(t2.x=t3.y))
Table Scan(OBJECT:(t3), WHERE:(t1.x=t3.x))
Table Scan(OBJECT:(t2))
-------------------------------------

(I cleaned MSSQL explain a little bit for clarity).
So, may we allow references to LHS in such sublink?

[4]
https://www.postgresql.org/message-id/flat/15523.1372190410%40sss.pgh.pa.us
[5]
https://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2024-07-03 08:46:08 Re: Conflict Detection and Resolution
Previous Message Richard Guo 2024-07-03 08:29:27 Re: Eager aggregation, take 3