Quick Links

Performance Issue on Query 18 of TPC-H Benchmark

From:	Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
To:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Performance Issue on Query 18 of TPC-H Benchmark
Date:	2024-10-15 18:28:47
Message-ID:	SEZPR06MB6494B9CADF2FB15CDBCC7BBF8A452@SEZPR06MB6494.apcprd06.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

For this query 18 of TPC-H benchmark:
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from CUSTOMER, ORDERS, LINEITEM where o_orderkey in ( select l_orderkey from LINEITEM group by l_orderkey having sum(l_quantity) > 314 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;

Based on 1GB data (https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=drive_link) its query plan (EXPLAIN ANALYZE) is as follows:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=666783.65..666783.90 rows=100 width=71) (actual time=2511.074..2517.146 rows=9 loops=1)
-> Sort (cost=666783.65..668052.46 rows=507522 width=71) (actual time=2511.073..2517.144 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=583237.80..647386.53 rows=507522 width=71) (actual time=2511.057..2517.137 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Gather Merge (cost=583237.80..636813.14 rows=422936 width=71) (actual time=2511.053..2517.128 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=582237.78..586995.81 rows=211468 width=71) (actual time=2507.789..2507.795 rows=3 loops=3)
Group Key: customer.c_custkey, orders.o_orderkey
-> Sort (cost=582237.78..582766.45 rows=211468 width=44) (actual time=2507.783..2507.786 rows=21 loops=3)
Sort Key: customer.c_custkey, orders.o_orderkey
Sort Method: quicksort Memory: 26kB
Worker 0: Sort Method: quicksort Memory: 26kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=458569.18..557026.85 rows=211468 width=44) (actual time=2464.821..2507.757 rows=21 loops=3)
-> Parallel Hash Join (cost=458568.75..492734.09 rows=52844 width=43) (actual time=2464.793..2507.699 rows=3 loops=3)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Hash Join (cost=453562.50..487589.13 rows=52844 width=24) (actual time=2433.507..2476.356 rows=3 loops=3)
Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
-> Parallel Seq Scan on orders (cost=0.00..32386.00 rows=625000 width=20) (actual time=0.024..33.467 rows=500000 loops=3)
-> Hash (cost=451977.19..451977.19 rows=126825 width=4) (actual time=2412.836..2412.836 rows=9 loops=3)
Buckets: 131072 Batches: 1 Memory Usage: 1025kB
-> GroupAggregate (cost=0.43..451977.19 rows=126825 width=4) (actual time=708.272..2412.797 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1 (cost=0.43..416256.96 rows=6002623 width=9) (actual time=0.052..1331.820 rows=6001215 loops=3)
-> Parallel Hash (cost=4225.00..4225.00 rows=62500 width=23) (actual time=30.683..30.683 rows=50000 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 10304kB
-> Parallel Seq Scan on customer (cost=0.00..4225.00 rows=62500 width=23) (actual time=0.019..11.368 rows=50000 loops=3)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.016..0.017 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 0.833 ms
Execution Time: 2517.189 ms
(36 rows)

While I found that if we disable the path sorting here:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0c7273b9cc..91320f473a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6928,7 +6928,7 @@ make_ordered_path(PlannerInfo *root, RelOptInfo *rel, Path *path,
path->pathkeys,
&presorted_keys);

- if (!is_sorted)
+ if (false)
{
/*
* Try at least sorting the cheapest path and also try incrementally

Both the performance and estimated cost are reduced around 40%:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=390889.59..390889.84 rows=100 width=71) (actual time=1652.572..1655.298 rows=9 loops=1)
-> Sort (cost=390889.59..392158.39 rows=507522 width=71) (actual time=1652.571..1655.296 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=215938.45..371492.46 rows=507522 width=71) (actual time=1651.864..1655.256 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Gather (cost=215938.45..360919.08 rows=422936 width=71) (actual time=1651.670..1655.245 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=214938.45..317625.48 rows=211468 width=71) (actual time=1616.827..1648.580 rows=3 loops=3)
Group Key: customer.c_custkey, orders.o_orderkey
-> Nested Loop (cost=214938.45..313396.12 rows=211468 width=44) (actual time=1609.796..1648.571 rows=21 loops=3)
-> Parallel Hash Join (cost=214938.02..249103.36 rows=52844 width=43) (actual time=1609.777..1648.532 rows=3 loops=3)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Hash Join (cost=209931.77..243958.39 rows=52844 width=24) (actual time=1573.950..1612.634 rows=3 loops=3)
Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
-> Parallel Seq Scan on orders (cost=0.00..32386.00 rows=625000 width=20) (actual time=0.016..32.211 rows=500000 loops=3)
-> Hash (cost=208346.45..208346.45 rows=126825 width=4) (actual time=1549.849..1549.849 rows=9 loops=3)
Buckets: 131072 Batches: 1 Memory Usage: 1025kB
-> GroupAggregate (cost=0.00..208346.45 rows=126825 width=4) (actual time=334.397..1549.837 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1500388
-> Seq Scan on lineitem lineitem_1 (cost=0.00..172626.23 rows=6002623 width=9) (actual time=0.051..438.226 rows=6001215 loops=3)
-> Parallel Hash (cost=4225.00..4225.00 rows=62500 width=23) (actual time=34.762..34.763 rows=50000 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 10304kB
-> Parallel Seq Scan on customer (cost=0.00..4225.00 rows=62500 width=23) (actual time=0.011..10.406 rows=50000 loops=3)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.009..0.011 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 1.644 ms
Execution Time: 1655.490 ms
(31 rows)

The major differerence between both query plans is the first one has additional SORT. I believe the second query plan is more efficient. Similar to my last report, perhaps we can optimize code to enable it.

I also tried 10 GB data, in which the execution time is reduced from 30s to 16s, but the estimated cost is increased. I can provide more info if you need.

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

Responses

Re: Performance Issue on Query 18 of TPC-H Benchmark at 2024-10-15 21:56:08 from David Rowley
Re: Performance Issue on Query 18 of TPC-H Benchmark at 2024-10-15 23:21:09 from Alena Rybakina
Re: Performance Issue on Query 18 of TPC-H Benchmark at 2024-10-16 06:28:25 from Andrei Lepikhov

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	David Rowley	2024-10-15 21:56:08	Re: Performance Issue on Query 18 of TPC-H Benchmark
Previous Message	Tom Lane	2024-10-15 17:51:15	Re: BUG #18656: "STABLE" function sometimes does not see changes