Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-07-25 14:04:44
Message-ID: 759292d5-cb51-4b12-89fa-576c1d9b374d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.07.2024 03:52, Alexander Korotkov wrote:
> Hi, Alena!
>
> Let me answer to some of your findings.
>
> On Mon, Jul 22, 2024 at 12:53 AM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> To be honest,I saw a larger problem. Look at the query bellow:
>>
>> master:
>>
>> alena(at)postgres=# create table t (a int not null, b int not null, c int not null);
>> insert into t (select 1, 1, i from generate_series(1,10000) i);
>> insert into t (select i, 2, 2 from generate_series(1,10000) i);
>> create index t_a_b_idx on t (a, b);
> Just a side note. As I mention in [1], there is missing statement
> create index t_a_b_idx on t (a, b);
> to get same plan as in [2].
>
>> create statistics t_a_b_stat (mcv) on a, b from t;
>> create statistics t_b_c_stat (mcv) on b, c from t;
>> vacuum analyze t;
>> CREATE TABLE
>> INSERT 0 10000
>> INSERT 0 10000
>> CREATE INDEX
>> CREATE STATISTICS
>> CREATE STATISTICS
>> VACUUM
>> alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;
>> QUERY PLAN
>> ------------------------------------------------------------------------------
>> Bitmap Heap Scan on t (cost=156.55..465.57 rows=5001 width=12)
>> Recheck Cond: (a = 1)
>> Filter: ((c = 2) AND ((b = 1) OR (b = 2)))
>> -> Bitmap Index Scan on t_a_b_idx (cost=0.00..155.29 rows=10001 width=0)
>> Index Cond: (a = 1)
>> (5 rows)
>>
>>
>> The query plan if v26[0] and v27[1] versions are equal and wrong in my opinion -where is c=2 expression?
>>
>> v27 [1]
>> alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;
>> QUERY PLAN
>> ------------------------------------------------------------------------------
>> Bitmap Heap Scan on t (cost=165.85..474.87 rows=5001 width=12)
>> Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>> Filter: (c = 2)
>> -> Bitmap Index Scan on t_a_b_idx (cost=0.00..164.59 rows=10001 width=0)
>> Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>> (5 rows)
>> v26 [0]
>> alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;
>> QUERY PLAN
>> ------------------------------------------------------------------------------
>> Bitmap Heap Scan on t (cost=165.85..449.86 rows=5001 width=12)
>> Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>> Filter: (c = 2)
>> -> Bitmap Index Scan on t_a_b_idx (cost=0.00..164.59 rows=10001 width=0)
>> Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>> (5 rows)
> I think both v26 and v27 are correct here. The c = 2 condition is in
> the Filter.
Yes, I see it and agree with that.
>> In addition, I noticed that the ANY expression will be formed only for first group and ignore for others, like in the sample bellow:
>>
>> v26 version [0]:
>>
>> alena(at)postgres=# explain select * from t where (b = 1 or b = 2) and (a = 2 or a=3);
>> QUERY PLAN
>> -----------------------------------------------------------------------------------
>> Index Scan using t_a_b_idx on t (cost=0.29..24.75 rows=2 width=12)
>> Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY ('{1,2}'::integer[])))
>> (2 rows)
>>
>> v27 version [1]:
>>
>> alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
>> QUERY PLAN
>> --------------------------------------------------------
>> Seq Scan on t (cost=0.00..509.00 rows=14999 width=12)
>> Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
>> (2 rows)
> Did you notice you're running different queries on v26 and v27 here?
> If you will run ton v27 the same query you run on v26, the plan also
> will be the same.
>
>> alena(at)postgres=# create index a_idx on t(a);
>> CREATE INDEX
>> alena(at)postgres=# create index b_idx on t(b);
>> CREATE INDEX
>> alena(at)postgres=# analyze;
>> ANALYZE
>>
>> v26:
>>
>> alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
>> QUERY PLAN
>> ------------------------------------------------------------------------------------
>> Bitmap Heap Scan on t (cost=17.18..30.94 rows=4 width=12)
>> Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY ('{2,3}'::integer[])))
>> -> BitmapOr (cost=17.18..17.18 rows=4 width=0)
>> -> Bitmap Index Scan on a_idx (cost=0.00..8.59 rows=2 width=0)
>> Index Cond: (a = ANY ('{2,3}'::integer[]))
>> -> Bitmap Index Scan on a_idx (cost=0.00..8.59 rows=2 width=0)
>> Index Cond: (a = ANY ('{2,3}'::integer[]))
>> (7 rows)
>>
>> v27:
>>
>> alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
>> QUERY PLAN
>> --------------------------------------------------------
>> Seq Scan on t (cost=0.00..509.00 rows=14999 width=12)
>> Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
>> (2 rows)
>>
>> The behavior in version 26 is incorrect, but in version 27, it does not select anything other than seqscan
> Please, check that there is still possibility to the generate BitmapOr plan.
It is fine, I think. The transformation works, but due to the fact that
index columns are different for two indexes, the transformation hasn't
been applied.
>
> # explain select * from t where (b = 1 or b = 2 or a = 2 or a = 3);
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Bitmap Heap Scan on t (cost=326.16..835.16 rows=14999 width=12)
> Recheck Cond: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
> -> BitmapOr (cost=326.16..326.16 rows=20000 width=0)
> -> Bitmap Index Scan on t_b_c_idx (cost=0.00..151.29
> rows=10000 width=0)
> Index Cond: (b = 1)
> -> Bitmap Index Scan on t_b_c_idx (cost=0.00..151.29
> rows=10000 width=0)
> Index Cond: (b = 2)
> -> Bitmap Index Scan on t_a_b_idx (cost=0.00..4.29 rows=1 width=0)
> Index Cond: (a = 2)
> -> Bitmap Index Scan on t_a_b_idx (cost=0.00..4.29 rows=1 width=0)
> Index Cond: (a = 3)
>
> It has higher cost than SeqScan plan, but I think it would be selected
> on larger tables. And yes, this is not ideal, because it fails to
> generate BitmapOr over two IndexScans on SAOPs. But it's not worse
> than what current master does. An optimization doesn't have to do
> everything it could possible do. So, I think this could be improved
> in a separate patch.
>
> Links
> 1.https://www.postgresql.org/message-id/CAPpHfdvhWE5pArZhgJeLViLx3-A3rxEREZvfkTj3E%3Dh7q-Bx9w%40mail.gmail.com
> 2.https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com

Yes, I see and agree with you.

To be honest, I have found a big problem in this patch - we try to
perform the transformation every time we examime a column:

for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...

}

I have fixed it and moved the transformation before going through the loop.

I try to make an array expression for "OR" expr, but at the same time I
form the result as an "AND" expression, consisting of an "Array"
expression and "OR" expressions, and then I check whether there is an
index for this column, if so, I save it and write down the
transformation. I also had to return the previous part of the patch,
where we formed "ANY" groups, since we could end up with several such
groups. I hope I made my idea clear, but if not, please tell me.

Unfortunately, I have got the different result one of the query from
regression tests and I'm not sure if it is correct:

diff -U3
/home/alena/postgrespro_or3/src/test/regress/expected/create_index.out
/home/alena/postgrespro_or3/src/test/regress/results/create_index.out
---
/home/alena/postgrespro_or3/src/test/regress/expected/create_index.out
2024-07-23 18:51:13.077311360 +0300 +++
/home/alena/postgrespro_or3/src/test/regress/results/create_index.out
2024-07-25 16:43:56.895132328 +0300 @@ -1860,13 +1860,14 @@ EXPLAIN
(COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR
tenthous = (SELECT 1 + 2) OR tenthous = 42); - QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+---------------------------------------------------------------------------------
Index Scan using tenk1_thous_tenthous on tenk1 - Index Cond: ((thousand
= 42) AND (tenthous = ANY (ARRAY[1, (InitPlan 1).col1, 42]))) + Index
Cond: ((thousand = 42) AND (tenthous = ANY ('{1,-1,42}'::integer[]))) +
Filter: ((tenthous = 1) OR (tenthous = (InitPlan 1).col1) OR (tenthous =
42)) InitPlan 1 -> Result -(4 rows) +(5 rows) SELECT * FROM tenk1 WHERE
thousand = 42 AND (tenthous = 1 OR tenthous = (SELECT 1 + 2) OR tenthous
= 42);

I'm researching what's wrong here now.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v28-Transform-OR-clauses-to-ANY-expression.patch text/x-patch 32.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2024-07-25 14:09:09 Re: CREATE MATERIALIZED VIEW
Previous Message Dagfinn Ilmari Mannsåker 2024-07-25 13:56:07 Re: CREATE MATERIALIZED VIEW