Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-11-30 08:05:31
Message-ID: a30691c4-a773-43bc-8c8f-339e6388c60e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I forgot to apply my patches. For the first experiment was
0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff and for the second
experiment was 0002-OR-to-ANY-in-index.diff.

On 30.11.2023 11:00, Alena Rybakina wrote:
> Hi!
>
>>
>>> Honestly, it seems very hard to avoid the conclusion that this
>>> transformation is being done at too early a stage. Parse analysis is
>>> not the time to try to do query optimization. I can't really believe
>>> that there's a way to produce a committable patch along these lines.
>>> Ideally, a transformation like this should be done after we know what
>>> plan shape we're using (or considering using), so that we can make
>>> cost-based decisions about whether to transform or not. But at the
>>> very least it should happen somewhere in the planner. There's really
>>> no justification for parse analysis rewriting the SQL that the user
>>> entered.
>>
>> Here, we assume that array operation is generally better than many ORs.
>> As a result, it should be more effective to make OR->ANY
>> transformation in the parser (it is a relatively lightweight
>> operation here) and, as a second phase, decompose that in the optimizer.
>> We implemented earlier prototypes in different places of the
>> optimizer, and I'm convinced that only this approach resolves the
>> issues we found.
>> Does this approach look weird? Maybe. We can debate it in this thread.
>
> I think this is incorrect, and the example of A. Korotkov confirms
> this. If we perform the conversion at the parsing stage, we will skip
> the more important conversion using OR expressions. I'll show you in
> the example below.
>
> First of all, I will describe my idea to combine two approaches to
> obtaining plans with OR to ANY transformation and ANY to OR
> transformation. I think they are both good, and we can't work with
> just one of them, we should consider both the option of OR
> expressions, and with ANY.
>
> I did this by creating a RelOptInfo with which has references from the
> original RelOptInfo, for which conversion is possible either from
> ANY->OR, or vice versa. After obtaining the necessary transformation,
> I started the procedure for obtaining the seq and index paths for both
> relations and then calculated their cost. The relation with the lowest
> cost is considered the best.
>
> I'm not sure if this is the best approach, but it's less complicated.
>
> I noticed that I got a lower cost for not the best plan, but I think
> this corresponds to another topic related to the wrong estimate
> calculation.
>
> 1. The first patch is a mixture of the original patch (when we perform
> the conversion of OR to ANY at the parsing stage), and when we perform
> the conversion at the index creation stage with the conversion to an
> OR expression. We can see that the query proposed by A.Korotkov did
> not have the best plan with ANY expression at all, and even despite
> receiving a query with OR expressions, we cannot get anything better
> than SeqScan, due to the lack of effective logical transformations
> that would have been performed if we had left the OR expressions.
>
> So, I got query plans using enable_or_transformation if it is enabled:
>
> postgres=# create table test as (select (random()*10)::int x,
> (random()*1000) y
> from generate_series(1,1000000) i);
> create index test_x_1_y on test (y) where x = 1;
> create index test_x_2_y on test (y) where x = 2;
> vacuum analyze test;
> SELECT 1000000
> CREATE INDEX
> CREATE INDEX
> VACUUM
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
> WARNING:  cost with original approach: - 20440.000000
> WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
>                                 QUERY PLAN
> --------------------------------------------------------------------------
>
>  Gather  (cost=1000.00..12690.10 rows=1 width=12)
>    Workers Planned: 2
>    ->  Parallel Seq Scan on test  (cost=0.00..11690.00 rows=1 width=12)
>          Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision))
> (4 rows)
>
> and if it is off:
>
> postgres=# set enable_or_transformation =off;
> SET
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
>                                                   QUERY PLAN
> --------------------------------------------------------------------------------------------------------------
>
>  Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
>    Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
> '100'::double precision) AND (x = 2)))
>    ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
>          ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1
> width=0)
>                Index Cond: (y = '100'::double precision)
>          ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1
> width=0)
>                Index Cond: (y = '100'::double precision)
> (7 rows)
>
> 2. The second patch is my patch version when I moved the OR
> transformation in the s index formation stage:
>
> So, I got the best query plan despite the possible OR to ANY
> transformation:
>
> postgres=# create table test as (select (random()*10)::int x,
> (random()*1000) y
> from generate_series(1,1000000) i);
> create index test_x_1_y on test (y) where x = 1;
> create index test_x_2_y on test (y) where x = 2;
> vacuum analyze test;
> SELECT 1000000
> CREATE INDEX
> CREATE INDEX
> VACUUM
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
> WARNING:  cost with original approach: - 12.618000
> WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
>                                                   QUERY PLAN
> --------------------------------------------------------------------------------------------------------------
>
>  Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
>    Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
> '100'::double precision) AND (x = 2)))
>    ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
>          ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1
> width=0)
>                Index Cond: (y = '100'::double precision)
>          ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1
> width=0)
>                Index Cond: (y = '100'::double precision)
> (7 rows)
>
>
>
--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff text/x-patch 64.2 KB
0002-OR-to-ANY-in-index.diff text/x-patch 31.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-11-30 08:05:32 Re: pg_upgrade and logical replication
Previous Message Alena Rybakina 2023-11-30 08:00:28 Re: POC, WIP: OR-clause support for indexes