Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-02 19:00:49
Message-ID: CAPpHfdugRg1Nix_82AsUQOcz6Q-761_Y440t9zQLwdhy+YcC2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Alena!

On Wed, Apr 2, 2025 at 5:33 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:

> I prepared a patch according to my suggestions, it just checks that the
> transformation is not carried out if there is a var element, there are
> changes only in one test, but I think it is correct.
>
> diff -U3
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> --- /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> 2025-04-02 02:50:07.018329864 +0300
> +++ /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> 2025-04-02 17:27:09.845104001 +0300
> @@ -3027,18 +3027,15 @@
> SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
> (SELECT (3)))::integer)
> );
> - QUERY PLAN
> -----------------------------------------------------
> - Nested Loop
> - -> Unique
> - -> Sort
> - Sort Key: "*VALUES*".column1
> - -> Values Scan on "*VALUES*"
> - SubPlan 1
> - -> Result
> - -> Index Scan using onek_unique1 on onek t
> - Index Cond: (unique1 = "*VALUES*".column1)
> -(9 rows)
> + QUERY
> PLAN
>
> +------------------------------------------------------------------------------------------------
> + Bitmap Heap Scan on onek t
> + Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan
> 1).col1)))::integer]))
> + -> Bitmap Index Scan on onek_unique1
> + Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan
> 1).col1)))::integer]))
> + SubPlan 1
> + -> Result
> +(6 rows)
>
> -- Alow to transformation and hold conversion between types of colemns and
> -- declared type of column pointed in RTE
>
Your patch turns back performance degradation issue that I mention in [1].

With your patch.
# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), (140050), ((select 1)));
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..23175.01 rows=7 width=4) (actual
time=0.455..417.929 rows=7.00 loops=1)
Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, 140050,
(InitPlan 1).col1]))
Rows Removed by Filter: 999993
Buffers: shared hit=2362 read=2063
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
Planning Time: 0.211 ms
Execution Time: 417.984 ms
(8 rows)

Without your patch.
# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), (140050), ((select 1)));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.19..17050.26 rows=7 width=4) (actual
time=0.289..210.335 rows=7.00 loops=1)
Hash Cond: (test.val = "*VALUES*".column1)
Buffers: shared hit=2211 read=2214
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
-> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4) (actual
time=0.203..85.925 rows=1000000.00 loops=1)
Buffers: shared hit=2211 read=2214
-> Hash (cost=0.09..0.09 rows=7 width=4) (actual time=0.067..0.068
rows=7.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.09 rows=7 width=4)
(actual time=0.004..0.023 rows=7.00 loops=1)
Planning Time: 0.321 ms
Execution Time: 210.409 ms
(12 rows)

Only when we have all consts in the array, we can have SAOP hashing which
is competitive with hashing of VALUES node. So, I'm continuing with my
version of patch.

Links.
1.
https://www.postgresql.org/message-id/CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-04-02 19:23:38 Re: Reducing the log spam
Previous Message Alvaro Herrera 2025-04-02 18:24:22 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints