Quick Links

Re: Performance Issue on Query 18 of TPC-H Benchmark

From:	Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
To:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Performance Issue on Query 18 of TPC-H Benchmark
Date:	2024-10-16 14:56:54
Message-ID:	SEZPR06MB649463AD78819D69D1C060C28A462@SEZPR06MB6494.apcprd06.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

>I would like to know if you can improve that case by switching from the sorted group to a hashed one.

I used this patch to enable the first HashAggregate only in the query plan:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0c7273b9cc..b410452df1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6983,8 +6983,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
-
- if (can_sort)
+ static int call_count = 0;
+ call_count++;
+ if (can_sort && call_count != 2)
{
/*
* Use any available suitably-sorted path as input, and also consider

And got this query plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=693083.48..693083.73 rows=100 width=71) (actual time=2624.282..2629.392 rows=9 loops=1)
-> Sort (cost=693083.48..694352.29 rows=507522 width=71) (actual time=2624.281..2629.390 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=658421.05..673686.36 rows=507522 width=71) (actual time=2624.162..2629.346 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
Planned Partitions: 32 Batches: 1 Memory Usage: 793kB
-> Gather (cost=459569.18..608779.05 rows=507522 width=44) (actual time=2623.805..2629.229 rows=63 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=458569.18..557026.85 rows=211468 width=44) (actual time=2581.717..2620.494 rows=21 loops=3)
-> Parallel Hash Join (cost=458568.75..492734.09 rows=52844 width=43) (actual time=2581.704..2620.448 rows=3 loops=3)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Hash Join (cost=453562.50..487589.13 rows=52844 width=24) (actual time=2541.024..2579.759 rows=3 loops=3)
Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
-> Parallel Seq Scan on orders (cost=0.00..32386.00 rows=625000 width=20) (actual time=0.028..32.135 rows=500000 loops=3)
-> Hash (cost=451977.19..451977.19 rows=126825 width=4) (actual time=2515.787..2515.788 rows=9 loops=3)
Buckets: 131072 Batches: 1 Memory Usage: 1025kB
-> GroupAggregate (cost=0.43..451977.19 rows=126825 width=4) (actual time=608.052..2515.758 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1 (cost=0.43..416256.96 rows=6002623 width=9) (actual time=0.043..1399.708 rows=6001215 loops=3)
-> Parallel Hash (cost=4225.00..4225.00 rows=62500 width=23) (actual time=39.601..39.602 rows=50000 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 10304kB
-> Parallel Seq Scan on customer (cost=0.00..4225.00 rows=62500 width=23) (actual time=0.032..15.561 rows=50000 loops=3)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.012..0.014 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 1.850 ms
Execution Time: 2630.023 ms
(30 rows)

Compared to the query plan with GroupAggregate, both estimated cost and execution time are similar and have no significant difference.

>I think it was more interesting when I turned off parallelism and tried to build a query plan without AQO, and the execution time there was significantly reduced:
Turning off parallelism only brings this significant performance improvement?

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

In response to

Re: Performance Issue on Query 18 of TPC-H Benchmark at 2024-10-16 14:01:21 from Alena Rybakina

Responses

Re: Performance Issue on Query 18 of TPC-H Benchmark at 2024-10-16 16:42:52 from Alena Rybakina

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	PG Bug reporting form	2024-10-16 15:32:03	BUG #18658: Assert in SerialAdd() due to race condition
Previous Message	Tom Lane	2024-10-16 14:26:39	Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault