| From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> | 
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Performance Issue on Query 18 of TPC-H Benchmark | 
| Date: | 2024-10-16 14:01:21 | 
| Message-ID: | 45323a23-47d9-46f5-8e1b-8f0c961a4fad@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
Hi!
On 16.10.2024 09:28, Andrei Lepikhov wrote:
> On 10/16/24 01:28, Ba Jinsheng wrote:
>> The major differerence between both query plans is the first one has 
>> additional *SORT*. I believe the second query plan is more efficient. 
>> Similar to my last report, perhaps we can optimize code to enable it.
> I would like to know if you can improve that case by switching from 
> the sorted group to a hashed one.
> I see huge underestimation because of the HAVING clause on an 
> aggregate. It would be interesting to correct the prediction and 
> observe what will happen.
>
Unfortunately my laptop is not so fast, so my execution time is 
*25822.899 ms.*
Initial query plan on my laptop is the same:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=668833.97..668834.22 rows=100 width=71) (actual 
time=25814.870..25822.461 rows=9 loops=1)
    ->  Sort  (cost=668833.97..670117.12 rows=513262 width=71) (actual 
time=25814.867..25822.454 rows=9 loops=1)
          Sort Key: orders.o_totalprice DESC, orders.o_orderdate
          Sort Method: quicksort  Memory: 25kB
          ->  Finalize GroupAggregate (cost=584343.41..649217.47 
rows=513262 width=71) (actual time=25814.736..25822.399 rows=9 loops=1)
                Group Key: customer.c_custkey, orders.o_orderkey
                ->  Gather Merge  (cost=584343.41..638524.51 rows=427718 
width=71) (actual time=25814.716..25822.328 rows=9 loops=1)
                      Workers Planned: 2
                      Workers Launched: 2
                      ->  Partial GroupAggregate 
(cost=583343.39..588155.22 rows=213859 width=71) (actual 
time=25792.566..25792.609 rows=3 loops=3)
                            Group Key: customer.c_custkey, orders.o_orderkey
                            ->  Sort  (cost=583343.39..583878.04 
rows=213859 width=44) (actual time=25792.512..25792.527 rows=21 loops=3)
                                  Sort Key: customer.c_custkey, 
orders.o_orderkey
                                  Sort Method: quicksort  Memory: 25kB
                                  Worker 0:  Sort Method: quicksort  
Memory: 26kB
                                  Worker 1:  Sort Method: quicksort  
Memory: 26kB
                                  ->  Nested Loop 
(cost=458633.58..557830.13 rows=213859 width=44) (actual 
time=25480.327..25792.187 rows=21 loops=3)
                                        ->  Parallel Hash Join 
(cost=458633.15..492800.08 rows=53450 width=43) (actual 
time=25480.163..25791.800 rows=3 loops=3)
                                              Hash Cond: 
(orders.o_custkey = customer.c_custkey)
                                              ->  Hash Join 
(cost=453626.90..487653.53 rows=53450 width=24) (actual 
time=25367.158..25677.611 rows=3 loops=3)
                                                    Hash Cond: 
(orders.o_orderkey = lineitem_1.l_orderkey)
                                                    ->  Parallel Seq 
Scan on orders  (cost=0.00..32386.00 rows=625000 width=20) (actual 
time=0.110..237.788 rows=500000 loops=3)
                                                    ->  Hash 
(cost=452023.40..452023.40 rows=128280 width=4) (actual 
time=25157.711..25157.713 rows=9 loops=3)
                                                          Buckets: 
131072  Batches: 1  Memory Usage: 1025kB
                                                          -> 
GroupAggregate  (cost=0.43..452023.40 rows=128280 width=4) (actual 
time=5918.735..25157.610 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
->  Index Scan using lineitem_pkey on lineitem lineitem_1 
(cost=0.43..416242.51 rows=6001659 width=9) (actual 
time=0.186..11092.476 rows=6001215 loops=3)
                                              ->  Parallel Hash 
(cost=4225.00..4225.00 rows=62500 width=23) (actual 
time=110.867..110.868 rows=50000 loops=3)
                                                    Buckets: 262144  
Batches: 1  Memory Usage: 10304kB
                                                    ->  Parallel Seq 
Scan on customer  (cost=0.00..4225.00 rows=62500 width=23) (actual 
time=0.071..49.232 rows=50000 loops=3)
                                        ->  Index Scan using 
lineitem_pkey on lineitem  (cost=0.43..1.06 rows=16 width=9) (actual 
time=0.096..0.109 rows=7 loops=9)
                                              Index Cond: (l_orderkey = 
orders.o_orderkey)
  Planning Time: 5.025 ms
* Execution Time: 25822.899 ms*
(36 rows)
To get a better plan, I ran the query with the AQO [0] extension and it 
cut the execution time in half. Unfortunately, AQO doesn't remember the 
power of hash nodes, and to be honest, I set the power to the correct 
power in the code.
I know this is the wrong way, but I haven't had enough time to fix it 
properly. And so here is my query plan:
I also disabled parallelism and my query plan like this:
set max_parallel_maintenance_workers =0;
set max_parallel_workers=0;
set max_parallel_workers_per_gather =0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=451107.39..451107.48 rows=36 width=71) (actual 
time=12582.114..12582.136 rows=9 loops=1)
    ->  Sort  (cost=451107.39..451107.48 rows=36 width=71) (actual 
time=12582.112..12582.118 rows=9 loops=1)
          Sort Key: orders.o_totalprice DESC, orders.o_orderdate
          Sort Method: quicksort  Memory: 25kB
          ->  GroupAggregate  (cost=451105.65..451106.46 rows=36 
width=71) (actual time=12581.981..12582.082 rows=9 loops=1)
                Group Key: customer.c_custkey, orders.o_orderkey
                ->  Sort  (cost=451105.65..451105.74 rows=36 width=44) 
(actual time=12581.951..12581.969 rows=63 loops=1)
                      Sort Key: customer.c_custkey, orders.o_orderkey
                      Sort Method: quicksort  Memory: 28kB
                      ->  Nested Loop  (cost=1.71..451104.72 rows=36 
width=44) (actual time=2704.881..12581.823 rows=63 loops=1)
                            ->  Nested Loop (cost=1.28..451093.77 rows=9 
width=43) (actual time=2704.826..12581.457 rows=9 loops=1)
                                  ->  Nested Loop (cost=0.86..451089.74 
rows=9 width=24) (actual time=2704.741..12581.024 rows=9 loops=1)
                                        ->  GroupAggregate 
(cost=0.43..451013.73 rows=9 width=4) (actual time=2697.354..12522.126 
rows=9 loops=1)
                                              Group Key: 
lineitem_1.l_orderkey
                                              Filter: 
(sum(lineitem_1.l_quantity) > '314'::numeric)
                                              Rows Removed by Filter: 
1499991
                                              ->  Index Scan using 
lineitem_pkey on lineitem lineitem_1  (cost=0.43..416226.85 rows=6000615 
width=9) (actual time=0.030..5515.665 rows=6001215 loops=1)
                                        ->  Index Scan using orders_pkey 
on orders  (cost=0.43..8.45 rows=1 width=20) (actual time=6.530..6.530 
rows=1 loops=9)
                                              Index Cond: (o_orderkey = 
lineitem_1.l_orderkey)
                                  ->  Index Scan using customer_pkey on 
customer  (cost=0.42..0.45 rows=1 width=23) (actual time=0.039..0.039 
rows=1 loops=9)
                                        Index Cond: (c_custkey = 
orders.o_custkey)
                            ->  Index Scan using lineitem_pkey on 
lineitem  (cost=0.43..1.06 rows=16 width=9) (actual time=0.026..0.033 
rows=7 loops=9)
                                  Index Cond: (l_orderkey = 
orders.o_orderkey)
  Planning Time: 1.403 ms
  Execution Time: *12582.321 ms*
(25 rows)
[0] https://github.com/postgrespro/aqo
I think it was more interesting when I turned off parallelism and tried 
to build a query plan without AQO, and the execution time there was 
significantly reduced:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=722284.69..722284.94 rows=100 width=71) (actual 
time=16251.401..16251.413 rows=9 loops=1)
    ->  Sort  (cost=722284.69..723567.84 rows=513262 width=71) (actual 
time=16251.399..16251.408 rows=9 loops=1)
          Sort Key: orders.o_totalprice DESC, orders.o_orderdate
          Sort Method: quicksort  Memory: 25kB
          ->  GroupAggregate  (cost=512218.34..702668.18 rows=513262 
width=71) (actual time=16227.765..16251.371 rows=9 loops=1)
                Group Key: customer.c_custkey, orders.o_orderkey
                ->  Incremental Sort  (cost=512218.34..692402.94 
rows=513262 width=44) (actual time=16227.735..16251.241 rows=63 loops=1)
                      Sort Key: customer.c_custkey, orders.o_orderkey
                      Presorted Key: customer.c_custkey
                      Full-sort Groups: 2  Sort Method: quicksort 
Average Memory: 27kB  Peak Memory: 27kB
                      ->  Nested Loop (cost=512217.20..678432.66 
rows=513262 width=44) (actual time=16094.663..16251.114 rows=63 loops=1)
                            ->  Merge Join (cost=512216.77..522360.54 
rows=128280 width=43) (actual time=16094.596..16250.531 rows=9 loops=1)
                                  Merge Cond: (customer.c_custkey = 
orders.o_custkey)
                                  ->  Index Scan using customer_pkey on 
customer  (cost=0.42..7524.45 rows=150000 width=23) (actual 
time=0.039..148.356 rows=147198 loops=1)
                                  ->  Materialize 
(cost=512216.29..512857.69 rows=128280 width=24) (actual 
time=16068.493..16068.529 rows=9 loops=1)
                                        ->  Sort 
(cost=512216.29..512536.99 rows=128280 width=24) (actual 
time=16068.478..16068.496 rows=9 loops=1)
                                              Sort Key: orders.o_custkey
                                              Sort Method: quicksort  
Memory: 25kB
                                              ->  Hash Join 
(cost=453626.90..498700.41 rows=128280 width=24) (actual 
time=15309.799..16068.390 rows=9 loops=1)
                                                    Hash Cond: 
(orders.o_orderkey = lineitem_1.l_orderkey)
                                                    ->  Seq Scan on 
orders  (cost=0.00..41136.00 rows=1500000 width=20) (actual 
time=0.011..439.488 rows=1500000 loops=1)
                                                    ->  Hash 
(cost=452023.40..452023.40 rows=128280 width=4) (actual 
time=15104.560..15104.562 rows=9 loops=1)
                                                          Buckets: 
131072  Batches: 1  Memory Usage: 1025kB
                                                          -> 
GroupAggregate  (cost=0.43..452023.40 rows=128280 width=4) (actual 
time=3731.232..15104.495 rows=9 loops=1)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
->  Index Scan using lineitem_pkey on lineitem lineitem_1 
(cost=0.43..416242.51 rows=6001659 width=9) (actual time=0.034..6936.304 
rows=6001215 loops=1)
                            ->  Index Scan using lineitem_pkey on 
lineitem  (cost=0.43..1.06 rows=16 width=9) (actual time=0.045..0.053 
rows=7 loops=9)
                                  Index Cond: (l_orderkey = 
orders.o_orderkey)
  Planning Time: 1.782 ms
  Execution Time: *16251.698 ms*
(32 rows)
Looking at the code, I think the problem with cost estimation in 
aggregation nodes is caused by incorrect selectivity estimation 
(cost_agg function) and can be solved using advanced statistics. But to 
be honest, I'm still investigating this.
-- 
Regards,
Alena Rybakina
Postgres Professional
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tender Wang | 2024-10-16 14:14:31 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault | 
| Previous Message | Amit Langote | 2024-10-16 12:00:19 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |