Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2025-04-03 14:18:46
Message-ID: 6865d6c1-8bb0-4528-af96-f62220b67f27@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02.04.2025 22:00, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Wed, Apr 2, 2025 at 5:33 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> I prepared a patch according to my suggestions, it just checks
> that the transformation is not carried out if there is a var
> element, there are changes only in one test, but I think it is
> correct.
>
> diff -U3
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> ---
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> 2025-04-02 02:50:07.018329864 +0300
> +++
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> 2025-04-02 17:27:09.845104001 +0300
> @@ -3027,18 +3027,15 @@
>  SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
>    (SELECT (3)))::integer)
>  );
> -                     QUERY PLAN
> -----------------------------------------------------
> - Nested Loop
> -   ->  Unique
> -         ->  Sort
> -               Sort Key: "*VALUES*".column1
> -               ->  Values Scan on "*VALUES*"
> -                     SubPlan 1
> -                       ->  Result
> -   ->  Index Scan using onek_unique1 on onek t
> -         Index Cond: (unique1 = "*VALUES*".column1)
> -(9 rows)
> +                                           QUERY PLAN
> +------------------------------------------------------------------------------------------------
> + Bitmap Heap Scan on onek t
> +   Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   ->  Bitmap Index Scan on onek_unique1
> +         Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   SubPlan 1
> +     ->  Result
> +(6 rows)
>
>  -- Alow to transformation and hold conversion between types of
> colemns and
>  -- declared type of column pointed in RTE
>
> Your patch turns back performance degradation issue that I mention in [1].
>
> With your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
>             QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.01..23175.01 rows=7 width=4) (actual
> time=0.455..417.929 rows=7.00 loops=1)
>    Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, 140050,
> (InitPlan 1).col1]))
>    Rows Removed by Filter: 999993
>    Buffers: shared hit=2362 read=2063
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
>  Planning Time: 0.211 ms
>  Execution Time: 417.984 ms
> (8 rows)
>
> Without your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
>  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
>  Hash Semi Join  (cost=0.19..17050.26 rows=7 width=4) (actual
> time=0.289..210.335 rows=7.00 loops=1)
>    Hash Cond: (test.val = "*VALUES*".column1)
>    Buffers: shared hit=2211 read=2214
>    InitPlan 1
>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
>    ->  Seq Scan on test  (cost=0.00..14425.00 rows=1000000 width=4)
> (actual time=0.203..85.925 rows=1000000.00 loops=1)
>          Buffers: shared hit=2211 read=2214
>    ->  Hash  (cost=0.09..0.09 rows=7 width=4) (actual
> time=0.067..0.068 rows=7.00 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.09 rows=7
> width=4) (actual time=0.004..0.023 rows=7.00 loops=1)
>  Planning Time: 0.321 ms
>  Execution Time: 210.409 ms
> (12 rows)
>
> Only when we have all consts in the array, we can have SAOP hashing
> which is competitive with hashing of VALUES node.  So, I'm continuing
> with my version of patch.
>
> Links.
> 1.
> https://www.postgresql.org/message-id/CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw%40mail.gmail.com
>
>
Okay, I agree with you.

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2025-04-03 14:25:58 Re: [PATCH] Refactor SLRU to always use long file names
Previous Message Robert Haas 2025-04-03 14:10:26 Re: RFC: Logging plan of the running query