Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Nikolay Shaplov <dhyan(at)nataraj(dot)su>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-07-21 22:53:51
Message-ID: 38289f6c-b8c0-4713-8fb5-703fe771872a@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi! Thank you for your contribution to this thread!

To be honest,I saw a larger problem. Look at the query bellow:

master:

alena(at)postgres=# create table t (a int not null, b int not null, c int
not null);
insert into t (select 1, 1, i from generate_series(1,10000) i);
insert into t (select i, 2, 2 from generate_series(1,10000) i);
create index t_a_b_idx on t (a, b);
create statistics t_a_b_stat (mcv) on a, b from t;
create statistics t_b_c_stat (mcv) on b, c from t;
vacuum analyze t;
CREATE TABLE
INSERT 0 10000
INSERT 0 10000
CREATE INDEX
CREATE STATISTICS
CREATE STATISTICS
VACUUM
alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b =
2) and c = 2;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=156.55..465.57 rows=5001 width=12)
   Recheck Cond: (a = 1)
   Filter: ((c = 2) AND ((b = 1) OR (b = 2)))
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001
width=0)
         Index Cond: (a = 1)
(5 rows)

The query plan if v26[0] and v27[1] versions are equal and wrong in my
opinion -where is c=2 expression?

v27 [1]
alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b =
2) and c = 2;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=165.85..474.87 rows=5001 width=12)
   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
   Filter: (c = 2)
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001
width=0)
         Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)
v26 [0]
alena(at)postgres=# explain select * from t where a = 1 and (b = 1 or b =
2) and c = 2;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=165.85..449.86 rows=5001 width=12)
   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
   Filter: (c = 2)
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001
width=0)
         Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)

In addition, I noticed that the ANY expression will be formed only for
first group and ignore for others, like in the sample bellow:

v26 version [0]:

alena(at)postgres=# explain select * from t where (b = 1 or b = 2) and (a =
2 or a=3);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Index Scan using t_a_b_idx on t  (cost=0.29..24.75 rows=2 width=12)
   Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY
('{1,2}'::integer[])))
(2 rows)

v27 version [1]:

alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2
or a=3);
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

alena(at)postgres=# create index a_idx on t(a);
CREATE INDEX
alena(at)postgres=# create index b_idx on t(b);
CREATE INDEX
alena(at)postgres=# analyze;
ANALYZE

v26:

alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2
or a=3);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=17.18..30.94 rows=4 width=12)
   Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY
('{2,3}'::integer[])))
   ->  BitmapOr  (cost=17.18..17.18 rows=4 width=0)
         ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
               Index Cond: (a = ANY ('{2,3}'::integer[]))
         ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
               Index Cond: (a = ANY ('{2,3}'::integer[]))
(7 rows)

v27:

alena(at)postgres=# explain select * from t where (b = 1 or b = 2 or a = 2
or a=3);
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

The behavior in version 26 is incorrect, but in version 27, it does not
select anything other than seqscan

Since Thursday I have been trying to add the code forming groups of
identical "OR" expressions, as in version 26. I'm currently debugging
errors.

On 21.07.2024 11:17, Nikolay Shaplov wrote:
> В письме от среда, 17 июля 2024 г. 22:36:19 MSK пользователь Alexander
> Korotkov написал:
>
> Hi All!
>
> I am continue reading the patch, now it's newer version
>
> First main question:
>
> As far a I can get, the entry point for OR->ANY convertation have been moved
> to match_clause_to_indexcol funtion, that checks if some restriction can use
> index for performance.
>
> The thing I do not understand what match_clause_to_indexcol actually received
> as arguments. Should this be set of expressions with OR in between grouped by
> one of the expression argument?
>
> If not I do not understand how this ever should work.
 The point is that we do the transformation for those columns that have
an index, since this transformation is most useful in these cases. we
pass the parameters index relation and column number to find out
information about it.
>
> The rest is about code readability
>
>> + if (bms_is_member(index->rel->relid, rinfo->right_relids))
>> + return NULL;
To be honest, I'm not sure that I understand your question. Could you
explain me?
> This check it totally not obvious for person who is not deep into postgres
> code. There should go comment explaining what are we checking for, and why it
> does not suit our purposes
>
>
>> + foreach(lc, orclause->args)
>> + {
I'll add it, thank you.
> Being no great expert in postgres code, I am confused what are we iterating on
> here? Two arguments of OR statement? (a>1) OR (b>2) those in brackets? Or
> what? Comment explaining that would be a great help here.
>
>
>> +if (sub_rinfo->is_pushed_down != rinfo->is_pushed_down ||
>> + sub_rinfo->is_clone != rinfo->is_clone ||
>> + sub_rinfo->security_level != rinfo->security_level ||
>> + !bms_equal(sub_rinfo->required_relids, rinfo->required_relids) ||
>> + !bms_equal(sub_rinfo->incompatible_relids, rinfo-
> incompatible_relids) ||
>> + !bms_equal(sub_rinfo->outer_relids, rinfo->outer_relids))
>> + {
I'll add it.
> This check it totally mind-blowing... What in the name of existence is going
> on here?
>
> I would suggest to split these checks into parts (compiler optimizer should
> take care about overhead) and give each part a sane explanation.

Alexander suggested moving the transformation to another place and it is
correct in my opinion. All previous problems are now gone.
But he also cut the code - he made a transformation for one group of
"OR" expressions. I agree, some parts don't yet
provide enough explanation of what's going on. I'm correcting this now.

Speaking of the changes according to your suggestions, I made them in
version 26 [0] and just part of that code will end up in the current
version of the patch to process all groups of "OR" expressions.

I'll try to do this as best I can, but it took me a while to figure out
how to properly organize RestrictInfo in the index.

[0]
https://www.postgresql.org/message-id/3b9bb831-da52-4779-8f3e-f8b6b83ba41f%40postgrespro.ru

[1]
https://www.postgresql.org/message-id/CAPpHfdvhWE5pArZhgJeLViLx3-A3rxEREZvfkTj3E%3Dh7q-Bx9w%40mail.gmail.com

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-07-21 23:27:37 Re: xid_wraparound tests intermittent failure.
Previous Message Tom Lane 2024-07-21 21:36:48 Re: xid_wraparound tests intermittent failure.