Re: pull-up subquery if JOIN-ON contains refs to upper-query

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date: 2025-04-02 16:39:58
Message-ID: e172de23-7acb-4330-a40a-58ead7a47944@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

My colleague reviewed my patch and gave feedback on how to improve it -
for some queries with data types that I did not consider, pull-up is not
applied, although it should. Some of them:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
  FROM ta
 WHERE EXISTS (SELECT 1
                 FROM tb
                 JOIN tc
                   ON ta.id = tb.id
                  AND tb.id = ANY('{1}'::int[])
              );

                                QUERY PLAN
--------------------------------------------------------------------------
 Seq Scan on ta (actual rows=1.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 1
   SubPlan 1
     ->  Nested Loop (actual rows=0.50 loops=2)
           ->  Seq Scan on tb (actual rows=0.50 loops=2)
                 Filter: ((id = ANY ('{1}'::integer[])) AND (ta.id = id))
                 Rows Removed by Filter: 2
           ->  Seq Scan on tc (actual rows=1.00 loops=1)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active
               );
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on ta (actual rows=2.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   SubPlan 1
     ->  Nested Loop (actual rows=1.00 loops=2)
           ->  Seq Scan on tb (actual rows=1.00 loops=2)
                 Filter: (is_active AND (ta.id = id))
                 Rows Removed by Filter: 0
           ->  Seq Scan on tc (actual rows=1.00 loops=2)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active IS NOT NULL
               );

                                   QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on ta (actual rows=2.00 loops=1)
   Filter: EXISTS(SubPlan 1)
    SubPlan 1
       ->  Nested Loop (actual rows=1.00 loops=2)
             ->  Seq Scan on tb (actual rows=1.00 loops=2)
                   Filter: ((is_active IS NOT NULL) AND (ta.id = id))
                   Rows Removed by Filter: 0
             ->  Seq Scan on tc (actual rows=1.00 loops=2)

UPDATE tb SET is_active = NULL WHERE id = 2;

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
 SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.is_active IS NULL
               );
                              QUERY PLAN
----------------------------------------------------------------
 Seq Scan on ta (actual rows=1.00 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 1
   SubPlan 1
     ->  Nested Loop (actual rows=0.50 loops=2)
           ->  Seq Scan on tb (actual rows=0.50 loops=2)
                 Filter: ((is_active IS NULL) AND (ta.id = id))
                 Rows Removed by Filter: 4
           ->  Seq Scan on tc (actual rows=1.00 loops=1)

I see that I need to add a walker that, when traversing the tree,
determines whether there are conditions under which pull-up is
impossible - the presence of
volatility of functions and other restrictions, and leave the
transformation for the var objects that I added before, I described it
here.

Unfortunately, I need a few days to implement this and need time for a
review, and I think I will not have time to do this before the code
freeze, so
I am moving this to the next commitfest and not changing the status
"awaiting the author".

On 11.02.2025 18:59, Alena Rybakina wrote:
> On 10.02.2025 23:51, Ilia Evdokimov wrote:
>>
>> On 09.02.2025 18:14, Alena Rybakina wrote:
>>> Hi! I found another example where the transformation worked
>>> incorrectly and reconsidered the idea.
>>>
>>> As for conversion of exists_sublink_to_ANY, we need to get the
>>> flattened implicit-AND list of clauses and pull out the chunks of
>>> the WHERE clause that belong to the parent query,
>>> since we are called halfway through the parent's
>>> preprocess_expression() and earlier steps of preprocess_expression()
>>> wouldn't get applied to the pulled-up stuff unless we do them here.
>>> We also do some processing for vars depending on which side the var
>>> is on - if it's in a subquery, we only need to lower its level
>>> (varlevel) because subquery will be flatted, while
>>> for other vars that belong to the parent query, we need to do
>>> preparation to pull up the sub-select into top range table.
>>>
>>> For those expressions that we couldn't assign to either list, we
>>> define newWhere and apply both cases.
>>>
>>
>> When I run 'make -C contrib/ check', tests of postgres_fdw extension
>> failed. I might be wrong, but you should be careful with LIMIT.
>>
> Thank you for the review, I'm working on it.
>
Sorry for not responding, but I will fix this bug after I update the
code based on the comments above. Thank you for noticing and writing to
me, your feedback is very important.

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-04-02 16:40:54 Re: Draft for basic NUMA observability
Previous Message Peter Geoghegan 2025-04-02 16:36:10 Re: BTScanOpaqueData size slows down tests