Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-07-28 09:59:32
Message-ID: 531fc0ab-371e-4235-97e3-dd2d077b6995@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.07.2024 13:56, Alexander Korotkov wrote:
> On Thu, Jul 25, 2024 at 5:04 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> To be honest, I have found a big problem in this patch - we try to perform the transformation every time we examime a column:
>>
>> for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...
>>
>> }
>>
>> I have fixed it and moved the transformation before going through the loop.
> What makes you think there is a problem?

To be honest, I was bothered by the fact that we need to go through
expressions several times that obviously will not fit under other
conditions.
Just yesterday I thought that it would be worthwhile to create a list of
candidates - expressions that did not fit because the column did not
match the index (!match_index_to_operand(nconst_expr, indexcol, index)).

Another problem that is related to the first one that the boolexpr could
contain expressions referring to different operands, for example, both x
and y. that is, we may have the problem that the optimal "ANY"
expression may not be used, because the expression with x may come
earlier and the loop may end earlier.

alena(at)postgres=# create table b (x int, y int);
CREATE TABLE
alena(at)postgres=# insert into b select id, id from
generate_series(1,1000) as id;
INSERT 0 1000
alena(at)postgres=# create index x_idx on b(x);
CREATE INDEX
alena(at)postgres=# analyze;
ANALYZE
alena(at)postgres=# explain select * from b where y =3 or x =4 or x=5 or
x=6 or x = 7 or x=8 or x=9;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
   Filter: ((y = 3) OR (x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x =
8) OR (x = 9))
(2 rows)
alena(at)postgres=# explain select * from b where x =4 or x=5 or x=6 or x =
7 or x=8 or x=9 or y=1;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
   Filter: ((x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x = 8) OR (x =
9) OR (y = 1))
(2 rows)
alena(at)postgres=# explain select * from b where x =4 or x=5 or x=6 or x =
7 or x=8 or x=9;
                           QUERY PLAN
----------------------------------------------------------------
 Index Scan using x_idx on b  (cost=0.28..12.75 rows=6 width=8)
   Index Cond: (x = ANY ('{4,5,6,7,8,9}'::integer[]))
(2 rows)

Furthermore expressions can be stored in a different order.
For example, first comes "AND" expr, and then group of "OR" expr, which
we can convert to "ANY" expr, but we won't do this due to the fact that
we will exit the loop early, according to this condition:

if (!IsA(sub_rinfo->clause, OpExpr))
           return NULL;

or it may occur due to other conditions.

alena(at)postgres=# create index x_y_idx on b(x,y);
CREATE INDEX
alena(at)postgres=# analyze;
ANALYZE

alena(at)postgres=# explain select * from b where (x = 2 and y =3) or x =4
or x=5 or x=6 or x = 7 or x=8 or x=9;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..35.00 rows=6 width=8)
   Filter: (((x = 2) AND (y = 3)) OR (x = 4) OR (x = 5) OR (x = 6) OR
(x = 7) OR (x = 8) OR (x = 9))
(2 rows)

Because of these reasons, I tried to save this and that transformation
together for each column and try to analyze for each expr separately
which method would be optimal.

> Do you have a test case
> illustrating a slow planning time?
No, I didn't have time to measure it and sorry for that. I'll do it.
> When v27 performs transformation for a particular column, it just
> stops facing the first unmatched OR entry. So,
> match_orclause_to_indexcol() examines just the first OR entry for all
> the columns excepts at most one. So, the check
> match_orclause_to_indexcol() does is not much slower than other
> match_*_to_indexcol() do.
>
> I actually think this could help performance in many cases, not hurt
> it. At least we get rid of O(n^2) complexity over the number of OR
> entries, which could be very many.

I agree with you that there is an overhead and your patch fixes this
problem, but optimizer needs to have a good ordering of expressions for
application.

I think we can try to move the transformation to another place where
there is already a loop pass, and also save two options "OR" expr and
"ANY" expr in one place (through BoolExpr) (like find_duplicate_ors
function) and teach the optimizer to determine which option is better,
for example, like now in match_orclause_to_indexcol() function.

What do you thing about it?

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2024-07-28 10:30:02 Re: Protocol question regarding Portal vs Cursor
Previous Message Peter Eisentraut 2024-07-28 08:36:46 Re: improve ssl error code, 2147483650