From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2025-02-03 10:22:20 |
Message-ID: | 036d64e6-4d47-485c-b047-6664d452b3e8@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for updated version! I agree for your version of the code.
On 02.02.2025 21:00, Alexander Korotkov wrote:
> On Fri, Jan 31, 2025 at 4:31 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> I started reviewing at the patch and saw some output "ERROR" in the output of the test and is it okay here?
>>
>> SELECT * FROM tenk1 t1
>> WHERE t1.thousand = 42 OR t1.thousand = (SELECT t2.tenthous FROM tenk1 t2 WHERE t2.thousand = t1.tenthous);
>> ERROR: more than one row returned by a subquery used as an expression
> The output is correct for this query. But the query is very
> unfortunate for the regression test. I've revised query in the v47
> revision [1].
>
> Links.
> 1.https://www.postgresql.org/message-id/CAPpHfdsBZmNt9qUoJBqsQFiVDX1%3DyCKpuVAt1YnR7JCpP%3Dk8%2BA%40mail.gmail.com
While analyzing the modified query plan from the regression test, I
noticed that despite using a full seqscan for table t2 in the original
plan,
its results are cached by Materialize node, and this can significantly
speed up the execution of the NestedLoop algorithm.
For example, after running the query several times, I got results that
show that the query execution time was twice as bad.
Original plan:
EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2
WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..70067.00 rows=2502499 width=24) (actual
time=0.015..1123.247 rows=2003000 loops=1) Join Filter: ((t1.a = t2.b)
OR (t1.a = 1)) Rows Removed by Join Filter: 1997000 Buffers: shared
hit=22 -> Seq Scan on bitmap_split_or t1 (cost=0.00..31.00 rows=2000
width=12) (actual time=0.006..0.372 rows=2000 loops=1) Buffers: shared
hit=11 -> Materialize (cost=0.00..41.00 rows=2000 width=12) (actual
time=0.000..0.111 rows=2000 loops=2000) Storage: Memory Maximum Storage:
110kB Buffers: shared hit=11 -> Seq Scan on bitmap_split_or t2
(cost=0.00..31.00 rows=2000 width=12) (actual time=0.003..0.188
rows=2000 loops=1) Buffers: shared hit=11 Planning Time: 0.118 ms
Execution Time: 1204.874 ms (13 rows)
Query plan after the patch:
EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2
WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.28..56369.00 rows=2502499 width=24) (actual
time=0.121..2126.606 rows=2003000 loops=1) Buffers: shared hit=16009
read=2 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000
width=12) (actual time=0.017..0.652 rows=2000 loops=1) Buffers: shared
hit=11 -> Index Scan using t_a_b_idx on bitmap_split_or t1
(cost=0.28..18.15 rows=1002 width=12) (actual time=0.044..0.627
rows=1002 loops=2000) Index Cond: (a = ANY (ARRAY[t2.b, 1])) Buffers:
shared hit=15998 read=2 Planning Time: 0.282 ms Execution Time: 2344.367
ms (9 rows)
I'm afraid that we may lose this with this optimization. Maybe this can
be taken into account somehow, what do you think?
--
Regards,
Alena Rybakina
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2025-02-03 10:31:07 | Re: Truncate logs by max_log_size |
Previous Message | Amit Kapila | 2025-02-03 09:25:05 | Re: Introduce XID age and inactive timeout based replication slot invalidation |