Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Date: 2024-10-03 20:10:25
Message-ID: 843f8d10-0a84-478b-b14e-9e8e29024634@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 03.10.2024 22:52, Ivan Kush wrote:
>
> Hello, hackers! I with my friends propose the patch to replace IN
> VALUES to ANY in WHERE clauses.
>
> # Intro
>
> The `VALUES` in the `IN VALUES` construct is replaced with with an
> array of values when `VALUES` contains 1 column. In the end it will be
> replaced with ANY by the existing function makeA_Expr
> (src/backend/nodes/makefuncs.c)
>
> This improves performance, especially if the values are small.
>
> # Patch
>
> v1-in_values_to_array.patch
>
> # How realized
>
> `VALUES` statement corresponds to `values_clause` nonterminal symbol
> in gram.y, where it's parsed to `SelectStmt` node.
>
> `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1
> column, parser extracts data from `SelectStmt` and passes it
>
> to function call `makeSimpleA_Expr` where simple `A_Expr` is created.
>
> Later during optimizations of parser tree this `A_Expr` will be
> transformed to `ArrayExpr` (already realized in Postgres)
>
>
> # Authors.
> Author: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
> Author: Vadim Yacenko <vadim(dot)yacenko(at)tantorlabs(dot)com>
> Author: Alexander Simonov <alexander(dot)simonov(at)tantorlabs(dot)com>
>
> # Tests
> Implementation contains many regression tests of varying complexity,
> which check supported features.
>
> # Platform
> This patch was checkouted from tag REL_17_STABLE. Code is developed in
> Linux, doesn't contain platfrom-specific code, only Postgres internal
> data structures and functions.
>
> # Documentation
> Regression tests contain many examples
>
> # Performance
> It increases performance
>
> # Example
> Let's compare result. With path the execution time is significantly
> lower.
>
> We have a table table1 with 10000 rows.
>
> postgres=# \d table1;
>                          Table "public.table1"
>  Column |            Type             | Collation | Nullable | Default
> --------+-----------------------------+-----------+----------+---------
>  fld1   | timestamp without time zone |           | not null |
>  fld2   | bytea                       |           | not null |
> Indexes:
>     "table1index" btree (fld2)
>
> Let's execute several commands
> see commands.sql
>
> Plan no patch
> see plan_no_patch.txt
>
>
> Plan with patch
> see plan_with_patch.txt

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
    where x in (VALUES(1200), (1));
                    QUERY PLAN
---------------------------------------------------
 Seq Scan on t (actual rows=1 loops=1)
   Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

Anlrey Lepikhov and I recently described this in an article [0] here and
the implementation already exists, but for now it was posted a binary
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-10-03 20:15:30 Re: POC, WIP: OR-clause support for indexes
Previous Message Laurenz Albe 2024-10-03 20:10:14 Re: On disable_cost