Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-06-17 10:33:49
Message-ID: 7d5aed92-d4cc-4b76-8ae0-051d182c9eec@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, thank you for your work with this subject!

On 14.06.2024 15:00, Alexander Korotkov wrote:
> On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
>> I've revised the patch. Did some beautification, improvements for
>> documentation, commit messages etc.
>>
>> I've pushed the 0001 patch without 0002. I think 0001 is good by
>> itself given that there is the or_to_any_transform_limit GUC option.
>> The more similar OR clauses are here the more likely grouping them
>> into SOAP will be a win. But I've changed the default value to 5.
>> This will make it less invasive and affect only queries with obvious
>> repeating patterns. That also reduced the changes in the regression
>> tests expected outputs.
>>
>> Regarding 0002, it seems questionable since it could cause a planning
>> slowdown for SAOP's with large arrays. Also, it might reduce the win
>> of transformation made by 0001. So, I think we should skip it for
>> now.
> The patch has been reverted from pg17. Let me propose a new version
> for pg18 based on the valuable feedback from Tom Lane [1][2].
>
> * The transformation is moved to the stage of adding restrictinfos to
> the base relation (in particular add_base_clause_to_rel()). This
> leads to interesting consequences. While this allows IndexScans to
> use transformed clauses, BitmapScans and SeqScans seem unaffected.
> Therefore, I wasn't able to find a planning regression.
> * As soon as there is no planning regression anymore, I've removed
> or_to_any_transform_limit GUC, which was a source of critics.
> * Now, not only Consts allowed in the SAOP's list, but also Params.
> * The criticized hash based on expression jumbling has been removed.
> Now, the plain list is used instead.
> * OrClauseGroup now gets a legal node tag. That allows to mix it in
> the list with other nodes without hacks.
>
> I think this patch shouldn't be as good as before for optimizing
> performance of large OR lists, given that BitmapScans and SeqScans
> still deal with ORs. However, it allows IndexScans to handle more,
> doesn't seem to cause planning regression and therefore introduce no
> extra GUC. Overall, this seems like a good compromise.
>
> This patch could use some polishing, but I'd like to first hear some
> feedback on general design.
>
> Links
> 1.https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
> 2.https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us

Inoticedthat7librarieshave
beenaddedtosrc/backend/optimizer/plan/initsplan.c,andas faras
Iremember,TomLanehas alreadyexpresseddoubtsaboutthe
approachthatrequiresaddinga largenumberof libraries[0], but I'm afraid
I'm out of ideas about alternative approach.

In addition,Icheckedthe fixinthe
previouscasesthatyouwroteearlier[1]andnoticedthatSeqScancontinuesto
generate,unfortunately,withoutconvertingexpressions:

with patch:

create table test as (select (random()*10)::int x, (random()*1000) y
from generate_series(1,1000000) i); create index test_x_1_y on test (y)
where x = 1; create index test_x_2_y on test (y) where x = 2; vacuum
analyze test; SELECT 1000000 CREATE INDEX CREATE INDEX VACUUM
alena(at)postgres=# explain select * from test where (x = 1 or x = 2) and y
= 100; QUERY PLAN
--------------------------------------------------------------------------
Gather (cost=1000.00..12690.10 rows=1 width=12) Workers Planned: 2 ->
Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) Filter:
(((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (4 rows)
alena(at)postgres=# set enable_seqscan =off; SET alena(at)postgres=# explain
select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on test (cost=10000000000.00..10000020440.00 rows=1 width=12)
Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (2 rows)

without patch:

--------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck
Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
'100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60
rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30
rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap
Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y
= '100'::double precision) (7 rows)

[0] https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us

[1]
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-06-17 10:50:56 Re: Avoid orphaned objects dependencies, take 3
Previous Message Alena Rybakina 2024-06-17 10:09:51 Re: Vacuum statistics