From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Performance Issue on Query 18 of TPC-H Benchmark |
Date: | 2024-10-16 14:01:21 |
Message-ID: | 45323a23-47d9-46f5-8e1b-8f0c961a4fad@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi!
On 16.10.2024 09:28, Andrei Lepikhov wrote:
> On 10/16/24 01:28, Ba Jinsheng wrote:
>> The major differerence between both query plans is the first one has
>> additional *SORT*. I believe the second query plan is more efficient.
>> Similar to my last report, perhaps we can optimize code to enable it.
> I would like to know if you can improve that case by switching from
> the sorted group to a hashed one.
> I see huge underestimation because of the HAVING clause on an
> aggregate. It would be interesting to correct the prediction and
> observe what will happen.
>
Unfortunately my laptop is not so fast, so my execution time is
*25822.899 ms.*
Initial query plan on my laptop is the same:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=668833.97..668834.22 rows=100 width=71) (actual
time=25814.870..25822.461 rows=9 loops=1)
-> Sort (cost=668833.97..670117.12 rows=513262 width=71) (actual
time=25814.867..25822.454 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=584343.41..649217.47
rows=513262 width=71) (actual time=25814.736..25822.399 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Gather Merge (cost=584343.41..638524.51 rows=427718
width=71) (actual time=25814.716..25822.328 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate
(cost=583343.39..588155.22 rows=213859 width=71) (actual
time=25792.566..25792.609 rows=3 loops=3)
Group Key: customer.c_custkey, orders.o_orderkey
-> Sort (cost=583343.39..583878.04
rows=213859 width=44) (actual time=25792.512..25792.527 rows=21 loops=3)
Sort Key: customer.c_custkey,
orders.o_orderkey
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort
Memory: 26kB
Worker 1: Sort Method: quicksort
Memory: 26kB
-> Nested Loop
(cost=458633.58..557830.13 rows=213859 width=44) (actual
time=25480.327..25792.187 rows=21 loops=3)
-> Parallel Hash Join
(cost=458633.15..492800.08 rows=53450 width=43) (actual
time=25480.163..25791.800 rows=3 loops=3)
Hash Cond:
(orders.o_custkey = customer.c_custkey)
-> Hash Join
(cost=453626.90..487653.53 rows=53450 width=24) (actual
time=25367.158..25677.611 rows=3 loops=3)
Hash Cond:
(orders.o_orderkey = lineitem_1.l_orderkey)
-> Parallel Seq
Scan on orders (cost=0.00..32386.00 rows=625000 width=20) (actual
time=0.110..237.788 rows=500000 loops=3)
-> Hash
(cost=452023.40..452023.40 rows=128280 width=4) (actual
time=25157.711..25157.713 rows=9 loops=3)
Buckets:
131072 Batches: 1 Memory Usage: 1025kB
->
GroupAggregate (cost=0.43..452023.40 rows=128280 width=4) (actual
time=5918.735..25157.610 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1
(cost=0.43..416242.51 rows=6001659 width=9) (actual
time=0.186..11092.476 rows=6001215 loops=3)
-> Parallel Hash
(cost=4225.00..4225.00 rows=62500 width=23) (actual
time=110.867..110.868 rows=50000 loops=3)
Buckets: 262144
Batches: 1 Memory Usage: 10304kB
-> Parallel Seq
Scan on customer (cost=0.00..4225.00 rows=62500 width=23) (actual
time=0.071..49.232 rows=50000 loops=3)
-> Index Scan using
lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual
time=0.096..0.109 rows=7 loops=9)
Index Cond: (l_orderkey =
orders.o_orderkey)
Planning Time: 5.025 ms
* Execution Time: 25822.899 ms*
(36 rows)
To get a better plan, I ran the query with the AQO [0] extension and it
cut the execution time in half. Unfortunately, AQO doesn't remember the
power of hash nodes, and to be honest, I set the power to the correct
power in the code.
I know this is the wrong way, but I haven't had enough time to fix it
properly. And so here is my query plan:
I also disabled parallelism and my query plan like this:
set max_parallel_maintenance_workers =0;
set max_parallel_workers=0;
set max_parallel_workers_per_gather =0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=451107.39..451107.48 rows=36 width=71) (actual
time=12582.114..12582.136 rows=9 loops=1)
-> Sort (cost=451107.39..451107.48 rows=36 width=71) (actual
time=12582.112..12582.118 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=451105.65..451106.46 rows=36
width=71) (actual time=12581.981..12582.082 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Sort (cost=451105.65..451105.74 rows=36 width=44)
(actual time=12581.951..12581.969 rows=63 loops=1)
Sort Key: customer.c_custkey, orders.o_orderkey
Sort Method: quicksort Memory: 28kB
-> Nested Loop (cost=1.71..451104.72 rows=36
width=44) (actual time=2704.881..12581.823 rows=63 loops=1)
-> Nested Loop (cost=1.28..451093.77 rows=9
width=43) (actual time=2704.826..12581.457 rows=9 loops=1)
-> Nested Loop (cost=0.86..451089.74
rows=9 width=24) (actual time=2704.741..12581.024 rows=9 loops=1)
-> GroupAggregate
(cost=0.43..451013.73 rows=9 width=4) (actual time=2697.354..12522.126
rows=9 loops=1)
Group Key:
lineitem_1.l_orderkey
Filter:
(sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter:
1499991
-> Index Scan using
lineitem_pkey on lineitem lineitem_1 (cost=0.43..416226.85 rows=6000615
width=9) (actual time=0.030..5515.665 rows=6001215 loops=1)
-> Index Scan using orders_pkey
on orders (cost=0.43..8.45 rows=1 width=20) (actual time=6.530..6.530
rows=1 loops=9)
Index Cond: (o_orderkey =
lineitem_1.l_orderkey)
-> Index Scan using customer_pkey on
customer (cost=0.42..0.45 rows=1 width=23) (actual time=0.039..0.039
rows=1 loops=9)
Index Cond: (c_custkey =
orders.o_custkey)
-> Index Scan using lineitem_pkey on
lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.026..0.033
rows=7 loops=9)
Index Cond: (l_orderkey =
orders.o_orderkey)
Planning Time: 1.403 ms
Execution Time: *12582.321 ms*
(25 rows)
[0] https://github.com/postgrespro/aqo
I think it was more interesting when I turned off parallelism and tried
to build a query plan without AQO, and the execution time there was
significantly reduced:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=722284.69..722284.94 rows=100 width=71) (actual
time=16251.401..16251.413 rows=9 loops=1)
-> Sort (cost=722284.69..723567.84 rows=513262 width=71) (actual
time=16251.399..16251.408 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=512218.34..702668.18 rows=513262
width=71) (actual time=16227.765..16251.371 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Incremental Sort (cost=512218.34..692402.94
rows=513262 width=44) (actual time=16227.735..16251.241 rows=63 loops=1)
Sort Key: customer.c_custkey, orders.o_orderkey
Presorted Key: customer.c_custkey
Full-sort Groups: 2 Sort Method: quicksort
Average Memory: 27kB Peak Memory: 27kB
-> Nested Loop (cost=512217.20..678432.66
rows=513262 width=44) (actual time=16094.663..16251.114 rows=63 loops=1)
-> Merge Join (cost=512216.77..522360.54
rows=128280 width=43) (actual time=16094.596..16250.531 rows=9 loops=1)
Merge Cond: (customer.c_custkey =
orders.o_custkey)
-> Index Scan using customer_pkey on
customer (cost=0.42..7524.45 rows=150000 width=23) (actual
time=0.039..148.356 rows=147198 loops=1)
-> Materialize
(cost=512216.29..512857.69 rows=128280 width=24) (actual
time=16068.493..16068.529 rows=9 loops=1)
-> Sort
(cost=512216.29..512536.99 rows=128280 width=24) (actual
time=16068.478..16068.496 rows=9 loops=1)
Sort Key: orders.o_custkey
Sort Method: quicksort
Memory: 25kB
-> Hash Join
(cost=453626.90..498700.41 rows=128280 width=24) (actual
time=15309.799..16068.390 rows=9 loops=1)
Hash Cond:
(orders.o_orderkey = lineitem_1.l_orderkey)
-> Seq Scan on
orders (cost=0.00..41136.00 rows=1500000 width=20) (actual
time=0.011..439.488 rows=1500000 loops=1)
-> Hash
(cost=452023.40..452023.40 rows=128280 width=4) (actual
time=15104.560..15104.562 rows=9 loops=1)
Buckets:
131072 Batches: 1 Memory Usage: 1025kB
->
GroupAggregate (cost=0.43..452023.40 rows=128280 width=4) (actual
time=3731.232..15104.495 rows=9 loops=1)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1
(cost=0.43..416242.51 rows=6001659 width=9) (actual time=0.034..6936.304
rows=6001215 loops=1)
-> Index Scan using lineitem_pkey on
lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.045..0.053
rows=7 loops=9)
Index Cond: (l_orderkey =
orders.o_orderkey)
Planning Time: 1.782 ms
Execution Time: *16251.698 ms*
(32 rows)
Looking at the code, I think the problem with cost estimation in
aggregation nodes is caused by incorrect selectivity estimation
(cost_agg function) and can be solved using advanced statistics. But to
be honest, I'm still investigating this.
--
Regards,
Alena Rybakina
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Tender Wang | 2024-10-16 14:14:31 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |
Previous Message | Amit Langote | 2024-10-16 12:00:19 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |