Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2025-02-03 11:54:47
Message-ID: 1a4f1631-86c1-4185-a520-03b24a389449@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 03.02.2025 14:32, Alexander Korotkov wrote:
> On Mon, Feb 3, 2025 at 12:22 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> Thank you for updated version! I agree for your version of the code.
>>
>> On 02.02.2025 21:00, Alexander Korotkov wrote:
>>
>> On Fri, Jan 31, 2025 at 4:31 PM Alena Rybakina
>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>
>> I started reviewing at the patch and saw some output "ERROR" in the output of the test and is it okay here?
>>
>> SELECT * FROM tenk1 t1
>> WHERE t1.thousand = 42 OR t1.thousand = (SELECT t2.tenthous FROM tenk1 t2 WHERE t2.thousand = t1.tenthous);
>> ERROR: more than one row returned by a subquery used as an expression
>>
>> The output is correct for this query. But the query is very
>> unfortunate for the regression test. I've revised query in the v47
>> revision [1].
>>
>> Links.
>> 1.https://www.postgresql.org/message-id/CAPpHfdsBZmNt9qUoJBqsQFiVDX1%3DyCKpuVAt1YnR7JCpP%3Dk8%2BA%40mail.gmail.com
>>
>> While analyzing the modified query plan from the regression test, I noticed that despite using a full seqscan for table t2 in the original plan,
>> its results are cached by Materialize node, and this can significantly speed up the execution of the NestedLoop algorithm.
>>
>> For example, after running the query several times, I got results that show that the query execution time was twice as bad.
>>
>> Original plan:
>>
>> EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..70067.00 rows=2502499 width=24) (actual time=0.015..1123.247 rows=2003000 loops=1) Join Filter: ((t1.a = t2.b) OR (t1.a = 1)) Rows Removed by Join Filter: 1997000 Buffers: shared hit=22 -> Seq Scan on bitmap_split_or t1 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.006..0.372 rows=2000 loops=1) Buffers: shared hit=11 -> Materialize (cost=0.00..41.00 rows=2000 width=12) (actual time=0.000..0.111 rows=2000 loops=2000) Storage: Memory Maximum Storage: 110kB Buffers: shared hit=11 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.003..0.188 rows=2000 loops=1) Buffers: shared hit=11 Planning Time: 0.118 ms Execution Time: 1204.874 ms (13 rows)
>>
>> Query plan after the patch:
>>
>> EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.28..56369.00 rows=2502499 width=24) (actual time=0.121..2126.606 rows=2003000 loops=1) Buffers: shared hit=16009 read=2 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.017..0.652 rows=2000 loops=1) Buffers: shared hit=11 -> Index Scan using t_a_b_idx on bitmap_split_or t1 (cost=0.28..18.15 rows=1002 width=12) (actual time=0.044..0.627 rows=1002 loops=2000) Index Cond: (a = ANY (ARRAY[t2.b, 1])) Buffers: shared hit=15998 read=2 Planning Time: 0.282 ms Execution Time: 2344.367 ms (9 rows)
>>
>> I'm afraid that we may lose this with this optimization. Maybe this can be taken into account somehow, what do you think?
> The important aspect is that the second plan have lower cost than the
> first one. So, that's the question to the cost model. The patch just
> lets optimizer consider more comprehensive plurality of paths. You
> can let optimizer select the first plan by tuning *_cost params. For
> example, setting cpu_index_tuple_cost = 0.02 makes first plan win for
> me.
>
> Other than that the test query is quite unfortunate as t1.a=1 is very
> frequent. I've adjusted the query so that nested loop with index scan
> wins both in cost and execution time.
>
> I've also adjusted another test query as proposed by Andrei.
>
> I'm going to push this patch is there is no more notes.
>
> Links.
> 1.https://www.postgresql.org/message-id/fc1017ca-877b-4f86-b491-154cf123eedd%40gmail.com
>

Okay.I agree with your codeand have no more notes

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2025-02-03 12:03:27 Avoid updating inactive_since for invalid replication slots
Previous Message Ashutosh Bapat 2025-02-03 11:46:09 Re: NOT ENFORCED constraint feature