Quick Links

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject:	Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date:	2025-04-03 14:18:46
Message-ID:	6865d6c1-8bb0-4528-af96-f62220b67f27@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 02.04.2025 22:00, Alexander Korotkov wrote:
> Hi, Alena!
>
> On Wed, Apr 2, 2025 at 5:33 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> I prepared a patch according to my suggestions, it just checks
> that the transformation is not carried out if there is a var
> element, there are changes only in one test, but I think it is
> correct.
>
> diff -U3
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> ---
> /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out
> 2025-04-02 02:50:07.018329864 +0300
> +++
> /home/alena/postgrespro_or3/src/test/regress/results/subselect.out
> 2025-04-02 17:27:09.845104001 +0300
> @@ -3027,18 +3027,15 @@
> SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
>    (SELECT (3)))::integer)
> );
> -                     QUERY PLAN
> -----------------------------------------------------
> - Nested Loop
> -   -> Unique
> -         -> Sort
> -               Sort Key: "*VALUES*".column1
> -               -> Values Scan on "*VALUES*"
> -                     SubPlan 1
> -                       -> Result
> -   -> Index Scan using onek_unique1 on onek t
> -         Index Cond: (unique1 = "*VALUES*".column1)
> -(9 rows)
> +                                           QUERY PLAN
> +------------------------------------------------------------------------------------------------
> + Bitmap Heap Scan on onek t
> +   Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   -> Bitmap Index Scan on onek_unique1
> +         Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed
> SubPlan 1).col1)))::integer]))
> +   SubPlan 1
> +     -> Result
> +(6 rows)
>
> -- Alow to transformation and hold conversion between types of
> colemns and
> -- declared type of column pointed in RTE
>
> Your patch turns back performance degradation issue that I mention in [1].
>
> With your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------
> Seq Scan on test (cost=0.01..23175.01 rows=7 width=4) (actual
> time=0.455..417.929 rows=7.00 loops=1)
> Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, 140050,
> (InitPlan 1).col1]))
> Rows Removed by Filter: 999993
> Buffers: shared hit=2362 read=2063
> InitPlan 1
> -> Result (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
> Planning Time: 0.211 ms
> Execution Time: 417.984 ms
> (8 rows)
>
> Without your patch.
> # explain analyze select * from test where val in (VALUES (5000),
> (4000), (9000), (2000), (1000), (140050), ((select 1)));
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=0.19..17050.26 rows=7 width=4) (actual
> time=0.289..210.335 rows=7.00 loops=1)
> Hash Cond: (test.val = "*VALUES*".column1)
> Buffers: shared hit=2211 read=2214
> InitPlan 1
> -> Result (cost=0.00..0.01 rows=1 width=4) (actual
> time=0.002..0.002 rows=1.00 loops=1)
> -> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4)
> (actual time=0.203..85.925 rows=1000000.00 loops=1)
> Buffers: shared hit=2211 read=2214
> -> Hash (cost=0.09..0.09 rows=7 width=4) (actual
> time=0.067..0.068 rows=7.00 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 9kB
> -> Values Scan on "*VALUES*" (cost=0.00..0.09 rows=7
> width=4) (actual time=0.004..0.023 rows=7.00 loops=1)
> Planning Time: 0.321 ms
> Execution Time: 210.409 ms
> (12 rows)
>
> Only when we have all consts in the array, we can have SAOP hashing
> which is competitive with hashing of VALUES node. So, I'm continuing
> with my version of patch.
>
> Links.
> 1.
> https://www.postgresql.org/message-id/CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw%40mail.gmail.com
>
>
Okay, I agree with you.

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Re: Replace IN VALUES with ANY in WHERE clauses during optimization at 2025-04-02 19:00:49 from Alexander Korotkov

Responses

Re: Replace IN VALUES with ANY in WHERE clauses during optimization at 2025-04-03 15:26:48 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aleksander Alekseev	2025-04-03 14:25:58	Re: [PATCH] Refactor SLRU to always use long file names
Previous Message	Robert Haas	2025-04-03 14:10:26	Re: RFC: Logging plan of the running query