BUG #16888: Reoccurring group by clauses makes system slower

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: liuxy(at)gatech(dot)edu
Subject: BUG #16888: Reoccurring group by clauses makes system slower
Date: 2021-02-23 02:28:15
Message-ID: 16888-05b803340050f901@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16888
Logged by: XINYU LIU
Email address: liuxy(at)gatech(dot)edu
PostgreSQL version: 12.3
Operating system: Ubuntu 20.04
Description:

Hello,

We have 2 TPC-H queries which fetch the same tuples but have significant
query execution time differences (3.8 times).

We are sharing a pair of TPC-H queries that exhibit this performance
difference:

First query:
SELECT "t13"."c_acctbal"
FROM (
SELECT "t11"."o_comment",
"t12"."c_nationkey",
"t12"."c_acctbal"
FROM (
SELECT *
FROM "orders"
WHERE "o_orderstatus" = 'F') AS "t11"

INNER JOIN
(
SELECT *
FROM "customer"
WHERE "c_nationkey" <= 4
AND "c_name" =
'Customer#000000005') AS "t12"
ON "t11"."o_custkey" > "t12"."c_custkey"
GROUP BY "t11"."o_comment",
"t12"."c_nationkey",
"t12"."c_acctbal") AS "t13"
GROUP BY "t13"."c_acctbal" offset 8 rowsFETCH next 18 rows only

Second query:
SELECT "t23"."c_acctbal"
FROM (
SELECT *
FROM "orders"
WHERE "o_orderstatus" = 'F') AS "t22"
INNER JOIN
(
SELECT *
FROM "customer"
WHERE "c_nationkey" <= 4
AND "c_name" = 'Customer#000000005') AS "t23"
ON "t22"."o_custkey" > "t23"."c_custkey"
GROUP BY "t23"."c_acctbal" offset 8 rowsFETCH next 18 rows only

Actual Behavior
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes over 17 seconds, while the second query only takes 4.5 seconds.
We think the time difference results from different plans selected.
Specifically, in the first (slow) query, the DBMS performs both the inner
GROUP BY operation and the outer GROUP BY operation.

Query Execution Plan
First query:
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=505911.54..505911.72 rows=18 width=6) (actual
time=17102.666..17102.666 rows=0 loops=1)
-> HashAggregate (cost=505911.46..505913.46 rows=200 width=6) (actual
time=17102.661..17102.664 rows=1 loops=1)
Group Key: customer.c_acctbal
-> Group (cost=360749.03..490676.64 rows=1218786 width=59)
(actual time=9164.577..15427.569 rows=3568877 loops=1)
Group Key: orders.o_comment, customer.c_nationkey,
customer.c_acctbal
-> Gather Merge (cost=360749.03..483059.22 rows=1015656
width=59) (actual time=9164.573..13459.724 rows=3568877 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Group (cost=359749.01..364827.29 rows=507828
width=59) (actual time=3119.972..4402.277 rows=1189626 loops=3)
Group Key: orders.o_comment,
customer.c_nationkey, customer.c_acctbal
-> Sort (cost=359749.01..361018.58 rows=507828
width=59) (actual time=3119.968..3676.916 rows=1218159 loops=3)
Sort Key: orders.o_comment,
customer.c_nationkey, customer.c_acctbal
Sort Method: external merge Disk:
257528kB
Worker 0: Sort Method: quicksort Memory:
25kB
Worker 1: Sort Method: quicksort Memory:
25kB
-> Nested Loop (cost=0.00..292526.20
rows=507828 width=59) (actual time=65.037..1141.994 rows=1218159 loops=3)
Join Filter: (orders.o_custkey >
customer.c_custkey)
Rows Removed by Join Filter: 8
-> Parallel Seq Scan on customer
(cost=0.00..22602.50 rows=1 width=14) (actual time=65.029..65.037 rows=0
loops=3)
Filter: ((c_nationkey <= 4) AND
((c_name)::text = 'Customer#000000005'::text))
Rows Removed by Filter:
250000
-> Seq Scan on orders
(cost=0.00..224219.21 rows=3656359 width=53) (actual time=0.018..2108.302
rows=3654501 loops=1)
Filter: (o_orderstatus =
'F'::bpchar)
Rows Removed by Filter:
3845499
Planning Time: 0.750 ms
Execution Time: 17146.464 ms
(26 rows)

Second query:
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=294796.05..294796.30 rows=1 width=6) (actual
time=4486.879..4486.879 rows=0 loops=1)
-> Group (cost=294795.81..294796.05 rows=1 width=6) (actual
time=4486.876..4486.878 rows=1 loops=1)
Group Key: customer.c_acctbal
-> Gather Merge (cost=294795.81..294796.05 rows=2 width=6)
(actual time=4486.873..4487.007 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=293795.79..293795.80 rows=1 width=6) (actual
time=1559.177..1559.177 rows=0 loops=3)
Sort Key: customer.c_acctbal
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=293795.77..293795.78
rows=1 width=6) (actual time=1559.140..1559.140 rows=0 loops=3)
Group Key: customer.c_acctbal
-> Nested Loop (cost=0.00..292526.20
rows=507828 width=6) (actual time=63.752..1014.437 rows=1218159 loops=3)
Join Filter: (orders.o_custkey >
customer.c_custkey)
Rows Removed by Join Filter: 8
-> Parallel Seq Scan on customer
(cost=0.00..22602.50 rows=1 width=10) (actual time=63.734..63.738 rows=0
loops=3)
Filter: ((c_nationkey <= 4) AND
((c_name)::text = 'Customer#000000005'::text))
Rows Removed by Filter: 250000
-> Seq Scan on orders
(cost=0.00..224219.21 rows=3656359 width=4) (actual time=0.045..2108.431
rows=3654501 loops=1)
Filter: (o_orderstatus =
'F'::bpchar)
Rows Removed by Filter: 3845499
Planning Time: 0.418 ms
Execution Time: 4487.097 ms
(24 rows)

Expected Behavior
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL will evaluate them in roughly the same amount of time. It looks
to me that the query optimizer does not realize the inner GROUP BY clause is
unnecessary given the outer GROUP BY clause.

Test Environment
Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23
00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)

Here are the steps for reproducing our observations:

1. Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
2. Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
echo $i
name=`echo $i|cut -d'.' -f1`
psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
3. Execute the queries

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-02-23 03:27:55 BUG #16889: Suboptimal behavior related to join condition
Previous Message Santosh Udupi 2021-02-23 02:20:47 Re: pg_restore - generated column - not populating