Re: Exists pull-up application with JoinExpr

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Exists pull-up application with JoinExpr
Date: 2024-12-25 11:20:42
Message-ID: 46bc4eaf-58c5-42ab-8041-d3380a0768de@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.12.2024 13:25, Ranier Vilela wrote:
> Hi Alena.
>
> Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> escreveu:
>
> Hi, hackers!
>
> I found one pull-up that works if the inner join condition is
> written through the where condition,
>
> |create temp table ta (id int primary key, val int); insert into
> ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|
>
> |create temp table tb (id int primary key, aval int); insert into
> tb values(4,1); insert into tb values(5,1); insert into tb
> values(1,2); create temp table tc (id int primary key, aid int);
> insert into tc values(6,1); insert into tc values(7,2);|
>
> |EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *
> FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id
> <http://ta.id> = tb.id <http://tb.id>);|
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id =ta.id <http://ta.id>)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=67 read=12
> (14 rows)
>
> but it doesn't work if it is written through the outside condition.
>
> |alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
> OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON
> ta.id <http://ta.id> = tb.id <http://tb.id>); QUERY PLAN
> ------------------------------------------------------ Seq Scan on
> ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed
> by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual
> rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual
> rows=0 loops=3) Filter: (ta.id <http://ta.id> = id) Rows Removed
> by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1
> loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16
> read=9 (15 rows) |
>
> |I have written a patch to add this functionality and now it gives
> an query plan: |
>
> |alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
> OFF)
>  SELECT *
>    FROM ta
>   WHERE EXISTS (SELECT *
>                   FROM tb JOIN tc
>                   ON ta.id <http://ta.id> = tb.id <http://tb.id>);
>                      QUERY PLAN
> -------------------------------------------------------------------------
>  Nested Loop Semi Join (actual rows=1 loops=1)
>    Buffers: local hit=6
>    ->  Seq Scan on ta (actual rows=3 loops=1)
>          Buffers: local hit=1
>    ->  Nested Loop (actual rows=0 loops=3)
>          Buffers: local hit=5
>          ->  Index Only Scan using tb_pkey on tb (actual rows=0
> loops=3)
>                Index Cond: (id = ta.id <http://ta.id>)
>                Heap Fetches: 1
>                Buffers: local hit=4
>          ->  Seq Scan on tc (actual rows=1 loops=1)
>                Buffers: local hit=1
> (12 rows)|
>
> tb and tc form a Cartesian product, but in the case of the
> intersection condition with tuples from the table ta (ta.id
> <http://ta.id> = tb.id <http://tb.id>). So, according to the join
> condition, tb intersects only with 1, and only it gets into the
> result, but at the same time they appear twice - this is because
> of the Cartesian product of tb with tc
>
> |*How it works:*
> |
>
> I rewrote the code a bit so that it considers not only the quals
> in jointree->quals, but also those in join expression
> (subselect->jointree->fromlist). If they satisfy the conditions
> for using pull up, I add them to the list of clauses and form a
> "Bool" expression from them, joined by an "AND" operation.
>
> I took a look at this patch and I did a little polishing on it.
>
> And I believe that in testing, you need to set it to BUFFERS OFF,
> because of the recent change made to ANALYZE.
>
> The tests are failing, like this:
> QUERY PLAN
>  -------------------------------------------------------------------------
>  Nested Loop Semi Join (actual rows=2 loops=1)
> + Buffers: local hit=7
>  -> Seq Scan on ta (actual rows=2 loops=1)
> + Buffers: local hit=1
>  -> Nested Loop (actual rows=1 loops=2)
> + Buffers: local hit=6
>  -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
>  Index Cond: (id = ta.id <http://ta.id>)
>  Heap Fetches: 2
> + Buffers: local hit=4
>  -> Seq Scan on tc (actual rows=1 loops=2)
> -(7 rows)
> + Buffers: local hit=2
> +(12 rows)
>
Yes, you are right) Thank you for your interest to this thread)

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2024-12-25 12:44:03 Improve consistency checks in tbm_prepare_shared_iterate
Previous Message Vladlen Popolitov 2024-12-25 10:39:42 Re: Windows UTF8 system locale