Re: Exists pull-up application with JoinExpr

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Exists pull-up application with JoinExpr
Date: 2024-12-24 10:25:37
Message-ID: CAEudQAoD707uh5Pjpg5NMyF-QO=fzajA+BmtcoqQAeXN1C+TkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <
a(dot)rybakina(at)postgrespro(dot)ru> escreveu:

> Hi, hackers!
>
> I found one pull-up that works if the inner join condition is written
> through the where condition,
>
> create temp table ta (id int primary key, val int);
> insert into ta values(1,1);
> insert into ta values(2,2);insert into ta values(3,3);
> create temp table tb (id int primary key, aval int);
> insert into tb values(4,1);
> insert into tb values(5,1);
> insert into tb values(1,2);
>
> create temp table tc (id int primary key, aid int);
> insert into tc values(6,1);
> insert into tc values(7,2);
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb, tc
> WHERE ta.id = tb.id);
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id = ta.id)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=67 read=12
> (14 rows)
>
> but it doesn't work if it is written through the outside condition.
>
> alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb JOIN tc
> ON ta.id = tb.id);
> QUERY PLAN
> ------------------------------------------------------
> Seq Scan on ta (actual rows=1 loops=1)
> Filter: EXISTS(SubPlan 1)
> Rows Removed by Filter: 2
> Buffers: local hit=5
> SubPlan 1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=4
> -> Seq Scan on tb (actual rows=0 loops=3)
> Filter: (ta.id = id)
> Rows Removed by Filter: 3
> Buffers: local hit=3
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=16 read=9
> (15 rows)
>
>
> I have written a patch to add this functionality and now it gives an query
> plan:
>
> alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb JOIN tc
> ON ta.id = tb.id);
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id = ta.id)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> (12 rows)
>
> tb and tc form a Cartesian product, but in the case of the intersection
> condition with tuples from the table ta (ta.id = tb.id). So, according to
> the join condition, tb intersects only with 1, and only it gets into the
> result, but at the same time they appear twice - this is because of the
> Cartesian product of tb with tc
> *How it works:*
>
> I rewrote the code a bit so that it considers not only the quals in
> jointree->quals, but also those in join expression
> (subselect->jointree->fromlist). If they satisfy the conditions for using
> pull up, I add them to the list of clauses and form a "Bool" expression
> from them, joined by an "AND" operation.
>
I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this:
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
-> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
-> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
-> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
Index Cond: (id = ta.id)
Heap Fetches: 2
+ Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

best regards,
Ranier Vilela

Attachment Content-Type Size
v1-0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2024-12-24 11:36:55 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Shlok Kyal 2024-12-24 09:15:32 Re: Object identifier types in logical replication binary mode