From: | "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(at)eisentraut(dot)org> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-10-04 19:19:59 |
Message-ID: | 668892c1-fb11-3a79-ce5e-1c194b7b3263@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 29.09.2023 20:35, a.rybakina wrote:
>>
>> I'm sorry I didn't write for a long time, but I really had a very
>> difficult month, now I'm fully back to work.
>>
>> *I was able to implement the patches to the end and moved the
>> transformation of "OR" expressions to ANY.* I haven't seen a big
>> difference between them yet, one has a transformation before
>> calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the
>> other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests
>> are passing, I don't see any problems with selectivity, nothing has
>> fallen into the coredump, but I found some incorrect transformations.
>> What is the reason for these inaccuracies, I have not found, but, to
>> be honest, they look unusual). Gave the error below.
>>
>> In the patch, I don't like that I had to drag three libraries from
>> parsing until I found a way around it.The advantage of this approach
>> compared to the other ([1]) is that at this stage all possible or
>> transformations are performed, compared to the patch, where the
>> transformation was done at the parsing stage. That is, here, for
>> example, there are such optimizations in the transformation:
>>
>>
>> I took the common element out of the bracket and the rest is
>> converted to ANY, while, as noted by Peter Geoghegan, we did not have
>> several bitmapscans, but only one scan through the array.
>>
>> postgres=# explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR
>> prolang = 13 AND prolang = 3;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------
>> Seq Scan on pg_proc p1 (cost=0.00..151.66 rows=1 width=68) (actual
>> time=1.167..1.168 rows=0 loops=1)
>> Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid,
>> '2'::oid, '3'::oid])))
>> Rows Removed by Filter: 3302
>> Planning Time: 0.146 ms
>> Execution Time: 1.191 ms
>> (5 rows)
>> *Falls into coredump at me:*
>> explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
>>
>
Hi, all!
I fixed the kernel dump issue and all the regression tests were
successful, but I discovered another problem when I added my own
regression tests.
Some queries that contain "or" expressions do not convert to "ANY". I
have described this in more detail using diff as expected and real results:
diff -U3
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
---
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out
2023-10-04 21:54:12.496282667 +0300
+++
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
2023-10-04 21:55:41.665422459 +0300
@@ -1925,17 +1925,20 @@
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
- QUERY PLAN
---------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND (tenthous = ANY
('{1,3}'::integer[]))) OR (thousand = 41))
+ Recheck Cond: ((((thousand = 42) AND (tenthous = 1)) OR
((thousand = 42) AND (tenthous = 3))) OR (thousand = 41))
-> BitmapOr
- -> Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: ((thousand = 42) AND (tenthous = ANY
('{1,3}'::integer[])))
+ -> BitmapOr
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: ((thousand = 42) AND (tenthous = 1))
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: ((thousand = 42) AND (tenthous = 3))
-> Bitmap Index Scan on tenk1_thous_tenthous
Index Cond: (thousand = 41)
-(8 rows)
+(11 rows)
@@ -1946,24 +1949,50 @@
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
+ WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tenk1
+ Recheck Cond: ((thousand = 42) OR ((thousand = 42) AND
(tenthous = 1)) OR (tenthous = 1))
+ -> BitmapOr
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: (thousand = 42)
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: ((thousand = 42) AND (tenthous = 1))
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: (tenthous = 1)
+(10 rows)
+
+SELECT count(*) FROM tenk1
+ WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+ count
+-------
+ 11
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous <
2) OR thousand = 41;
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR
(thousand = ANY ('{42,99}'::integer[])))) OR (thousand = 41))
+ Recheck Cond: (((hundred = 42) AND ((thousand = 42) OR
(thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
-> BitmapOr
-> BitmapAnd
-> Bitmap Index Scan on tenk1_hundred
Index Cond: (hundred = 42)
-> BitmapOr
-> Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (tenthous < 2)
+ Index Cond: (thousand = 42)
-> Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand = ANY
('{42,99}'::integer[]))
+ Index Cond: (thousand = 99)
+ -> Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: (tenthous < 2)
-> Bitmap Index Scan on tenk1_thous_tenthous
Index Cond: (thousand = 41)
-(14 rows)
+(16 rows)
diff -U3
/home/alena/postgrespro__copy6/src/test/regress/expected/join.out
/home/alena/postgrespro__copy6/src/test/regress/results/join.out
--- /home/alena/postgrespro__copy6/src/test/regress/expected/join.out
2023-10-04 21:53:55.632069079 +0300
+++ /home/alena/postgrespro__copy6/src/test/regress/results/join.out
2023-10-04 21:55:46.597485979 +0300
explain (costs off)
select * from tenk1 a join tenk1 b on
(a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop
- Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 =
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR
(a.unique1 = 3))
+ Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 =
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND
(b.hundred = 4)))
-> Seq Scan on tenk1 b
-> Materialize
-> Bitmap Heap Scan on tenk1 a
- Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+ Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
-> BitmapOr
-> Bitmap Index Scan on tenk1_unique1
Index Cond: (unique1 < 20)
-> Bitmap Index Scan on tenk1_unique1
+ Index Cond: (unique1 = 3)
+ -> Bitmap Index Scan on tenk1_unique1
Index Cond: (unique1 = 1)
-> Bitmap Index Scan on tenk1_unique2
- Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
- -> Bitmap Index Scan on tenk1_unique1
- Index Cond: (unique1 = 3)
-(15 rows)
+ Index Cond: (unique2 = 3)
+ -> Bitmap Index Scan on tenk1_unique2
+ Index Cond: (unique2 = 7)
+(17 rows)
explain (costs off)
select * from tenk1 a join tenk1 b on
(a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop
- Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 =
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR
(a.unique1 = 3))
+ Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 =
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND
(b.hundred = 4)))
-> Seq Scan on tenk1 b
-> Materialize
-> Bitmap Heap Scan on tenk1 a
- Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+ Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
-> BitmapOr
-> Bitmap Index Scan on tenk1_unique1
Index Cond: (unique1 < 20)
-> Bitmap Index Scan on tenk1_unique1
+ Index Cond: (unique1 = 3)
+ -> Bitmap Index Scan on tenk1_unique1
Index Cond: (unique1 = 1)
-> Bitmap Index Scan on tenk1_unique2
- Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
- -> Bitmap Index Scan on tenk1_unique1
- Index Cond: (unique1 = 3)
-(15 rows)
+ Index Cond: (unique2 = 3)
+ -> Bitmap Index Scan on tenk1_unique2
+ Index Cond: (unique2 = 7)
+(17 rows)
I haven't been able to fully deal with this problem yet
I have attached my experimental patch with the code.
Attachment | Content-Type | Size |
---|---|---|
0001-Replace-OR-clause-to-ANY-expressions.diff | text/x-patch | 36.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-10-04 19:33:27 | Re: trying again to get incremental backup |
Previous Message | Robert Haas | 2023-10-04 19:10:05 | Re: Request for comment on setting binary format output per session |