From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com> |
Subject: | Re: A new strategy for pull-up correlated ANY_SUBLINK |
Date: | 2024-07-03 08:33:44 |
Message-ID: | 780533de-c9af-4823-94bf-dbbdb03c1bfa@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/1/24 16:17, Andrei Lepikhov wrote:
> On 10/12/23 14:52, Andy Fan wrote:
>> Here the sublink can't be pulled up because of its reference to
>> the LHS of left join, the original logic is that no matter the 'b.t
>> in ..'
>> returns the true or false, the rows in LHS will be returned. If we
>> pull it up to LHS, some rows in LHS will be filtered out, which
>> breaks its original semantics.
> Hi,
> I spent some time trying to understand your sentence.
> I mean the following case:
>
> SELECT * FROM t1 LEFT JOIN t2
> ON t2.x IN (SELECT y FROM t3 WHERE t1.x=t3.x);
>
> I read [1,2,3], but I am still unsure why it is impossible in the case
> of OUTER JOIN. By setting the LATERAL clause, we forbid any clauses from
> the RTE subquery to bubble up as a top-level clause and filter tuples
> from LHS, am I wrong? Does it need more research or you can show some
> case to support your opinion - why this type of transformation must be
> disallowed?
>
> [1] https://www.postgresql.org/message-id/6531.1218473967%40sss.pgh.pa.us
> [2]
> https://www.postgresql.org/message-id/BANLkTikGFtGnAaXVh5%3DntRdN%2B4w%2Br%3DNPuw%40mail.gmail.com
> [3] https://www.vldb.org/conf/1992/P091.PDF
>
I delved into it a bit more. After reading [4,5] I invented query that
is analogue of the query above, but with manually pulled-up sublink:
EXPLAIN (COSTS OFF)
SELECT * FROM t1 LEFT JOIN t2 JOIN LATERAL
(SELECT t1.x AS x1, y,x FROM t3) q1 ON (t2.x=q1.y AND q1.x1=q1.x) ON true;
And you can see the plan:
Nested Loop Left Join
-> Seq Scan on t1
-> Hash Join
Hash Cond: (t2.x = t3.y)
-> Seq Scan on t2
-> Hash
-> Seq Scan on t3
Filter: (t1.x = x)
Just for fun, I played with MSSQL Server and if I read its explain
correctly, it also allows pulls-up sublink which mentions LHS:
-------------------------------------
Nested Loops(Left Outer Join, OUTER REFERENCES:(t1.x))
Table Scan(OBJECT:(t1))
Hash Match(Right Semi Join, HASH:(t3.y)=(t2.x),
RESIDUAL:(t2.x=t3.y))
Table Scan(OBJECT:(t3), WHERE:(t1.x=t3.x))
Table Scan(OBJECT:(t2))
-------------------------------------
(I cleaned MSSQL explain a little bit for clarity).
So, may we allow references to LHS in such sublink?
[4]
https://www.postgresql.org/message-id/flat/15523.1372190410%40sss.pgh.pa.us
[5]
https://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
--
regards, Andrei Lepikhov
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2024-07-03 08:46:08 | Re: Conflict Detection and Resolution |
Previous Message | Richard Guo | 2024-07-03 08:29:27 | Re: Eager aggregation, take 3 |