Re: POC, WIP: OR-clause support for indexes

From: "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-10-04 19:19:59
Message-ID: 668892c1-fb11-3a79-ce5e-1c194b7b3263@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.09.2023 20:35, a.rybakina wrote:
>>
>> I'm sorry I didn't write for a long time, but I really had a very
>> difficult month, now I'm fully back to work.
>>
>> *I was able to implement the patches to the end and moved the
>> transformation of "OR" expressions to ANY.* I haven't seen a big
>> difference between them yet, one has a transformation before
>> calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the
>> other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests
>> are passing, I don't see any problems with selectivity, nothing has
>> fallen into the coredump, but I found some incorrect transformations.
>> What is the reason for these inaccuracies, I have not found, but, to
>> be honest, they look unusual). Gave the error below.
>>
>> In the patch, I don't like that I had to drag three libraries from
>> parsing until I found a way around it.The advantage of this approach
>> compared to the other ([1]) is that at this stage all possible or
>> transformations are performed, compared to the patch, where the
>> transformation was done at the parsing stage. That is, here, for
>> example, there are such optimizations in the transformation:
>>
>>
>> I took the common element out of the bracket and the rest is
>> converted to ANY, while, as noted by Peter Geoghegan, we did not have
>> several bitmapscans, but only one scan through the array.
>>
>> postgres=# explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR
>> prolang = 13 AND prolang = 3;
>>                                               QUERY PLAN
>> -------------------------------------------------------------------------------------------------------
>>  Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual
>> time=1.167..1.168 rows=0 loops=1)
>>    Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid,
>> '2'::oid, '3'::oid])))
>>    Rows Removed by Filter: 3302
>>  Planning Time: 0.146 ms
>>  Execution Time: 1.191 ms
>> (5 rows)
>> *Falls into coredump at me:*
>> explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
>>
>
Hi, all!

I fixed the kernel dump issue and all the regression tests were
successful, but I discovered another problem when I added my own
regression tests.
Some queries that contain "or" expressions do not convert to "ANY". I
have described this in more detail using diff as expected and real results:

diff -U3
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
---
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out
2023-10-04 21:54:12.496282667 +0300
+++
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
2023-10-04 21:55:41.665422459 +0300
@@ -1925,17 +1925,20 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
+                                                        QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((thousand = 42) AND (tenthous = ANY
('{1,3}'::integer[]))) OR (thousand = 41))
+         Recheck Cond: ((((thousand = 42) AND (tenthous = 1)) OR
((thousand = 42) AND (tenthous = 3))) OR (thousand = 41))
          ->  BitmapOr
-               ->  Bitmap Index Scan on tenk1_thous_tenthous
-                     Index Cond: ((thousand = 42) AND (tenthous = ANY
('{1,3}'::integer[])))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 1))
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 3))
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: (thousand = 41)
-(8 rows)
+(11 rows)
@@ -1946,24 +1949,50 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+                                            QUERY PLAN
+---------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: ((thousand = 42) OR ((thousand = 42) AND
(tenthous = 1)) OR (tenthous = 1))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (thousand = 42)
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: ((thousand = 42) AND (tenthous = 1))
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (tenthous = 1)
+(10 rows)
+
+SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+ count
+-------
+    11
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous <
2) OR thousand = 41;
-                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
+                                                       QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR
(thousand = ANY ('{42,99}'::integer[])))) OR (thousand = 41))
+         Recheck Cond: (((hundred = 42) AND ((thousand = 42) OR
(thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
          ->  BitmapOr
                ->  BitmapAnd
                      ->  Bitmap Index Scan on tenk1_hundred
                            Index Cond: (hundred = 42)
                      ->  BitmapOr
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (tenthous < 2)
+                                 Index Cond: (thousand = 42)
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (thousand = ANY
('{42,99}'::integer[]))
+                                 Index Cond: (thousand = 99)
+                           ->  Bitmap Index Scan on tenk1_thous_tenthous
+                                 Index Cond: (tenthous < 2)
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: (thousand = 41)
-(14 rows)
+(16 rows)

diff -U3
/home/alena/postgrespro__copy6/src/test/regress/expected/join.out
/home/alena/postgrespro__copy6/src/test/regress/results/join.out
--- /home/alena/postgrespro__copy6/src/test/regress/expected/join.out
2023-10-04 21:53:55.632069079 +0300
+++ /home/alena/postgrespro__copy6/src/test/regress/results/join.out
2023-10-04 21:55:46.597485979 +0300
 explain (costs off)
 select * from tenk1 a join tenk1 b on
   (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
   ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 =
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR
(a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 =
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND
(b.hundred = 4)))
    ->  Seq Scan on tenk1 b
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 < 20)
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)

 explain (costs off)
 select * from tenk1 a join tenk1 b on
   (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
   ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 =
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR
(a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 =
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND
(b.hundred = 4)))
    ->  Seq Scan on tenk1 b
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 < 20)
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)

I haven't been able to fully deal with this problem yet

I have attached my experimental patch with the code.

Attachment Content-Type Size
0001-Replace-OR-clause-to-ANY-expressions.diff text/x-patch 36.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-10-04 19:33:27 Re: trying again to get incremental backup
Previous Message Robert Haas 2023-10-04 19:10:05 Re: Request for comment on setting binary format output per session