From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A new strategy for pull-up correlated ANY_SUBLINK |
Date: | 2023-04-05 07:15:42 |
Message-ID: | CAKU4AWpp8aANvwU1ivjPtF3AakFjTf1jBm_OFmPSdCqizjBQLw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom:
Sorry for the delayed response! I think my knowledge has been refreshed
for this discussion.
> One thing I'm not at all clear about is whether we need to restrict
> the optimization so that it doesn't occur if the subquery contains
> outer references falling outside available_rels. I think that that
> case is covered by is_simple_subquery() deciding later to not pull up
> the subquery based on LATERAL restrictions, but maybe that misses
> something.
>
I think we need the restriction and that should be enough for this feature
. Given the query Richard provided before:
explain
select * from tenk1 A where exists
(select 1 from tenk2 B
where A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd));
It first can be converted to the below format without any issue.
SELECT * FROM tenk1 A SEMI JOIN tenk2 B
on A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd);
Then without the restriction, since we only pull the varnos from
sublink->testexpr, then it is {A}, so it convert to
SELECT * FROM
(tenk1 A SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C)
ON c.odd = b.odd AND a.hundred = v.hundred)
SEMI JOIN on tenk2 B ON TRUE;
then the above query is NOT A VALID QUERY since:
1. The above query is *not* same as
SELECT * FROM (tenk1 A SEMI JOIN tenk2 B) on true
SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C) v
ON v.odd = b.odd;
2. The above query requires b.odd when B is not available. So it is
right that an optimizer can't generate a plan for it. The fix would
be to do the restriction before applying this optimization.
I'm not sure pull-up-subquery can play any role here, IIUC, the bad thing
happens before pull-up-subquery.
I also write & analyze more test and found no issue by me
1. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up to rarg of the left join since A.hundred is not
available.
2. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
==> should not be pull-up to rarg of the left join since A.odd is not
available.
3. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should be pull-up to rarg of left join.
4. SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> pull-up as expected.
5. SELECT * FROM tenk1 A RIGHT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up into larg of left join since b.odd is not
available.
About the existing test case changes because of this patch, they do
requires on the sublink is planned to a subPlan, so I introduces the below
changes to keep the original intention.
Changes
A in (SELECT A FROM ..)
To
(A, random() > 0) in (SELECT a, random() > 0 FROM ..);
I'm also wondering whether the similar restriction in
> convert_EXISTS_sublink_to_join could be removed similarly.
> In this light it was a mistake for convert_EXISTS_sublink_to_join
> to manage the pullup itself rather than doing it in the two-step
> fashion that convert_ANY_sublink_to_join does it.
>
>
Yes, it is true! I prefer to believe this deserves a separate patch.
Any feedback is welcome!
--
Best Regards
Andy Fan
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Pull-up-direct-correlated-ANY_SUBLINK-using-later.patch | application/x-patch | 14.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2023-04-05 07:16:12 | Re: Should vacuum process config file reload more often |
Previous Message | Daniel Gustafsson | 2023-04-05 07:11:47 | Re: Fix code comment in postgres_fdw.c |