From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | xinyuliu(at)umich(dot)edu |
Subject: | BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function |
Date: | 2020-09-19 03:32:41 |
Message-ID: | 16625-07403f18463add24@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 16625
Logged by: XINYU LIU
Email address: xinyuliu(at)umich(dot)edu
PostgreSQL version: 12.3
Operating system: Ubuntu 19.10
Description:
Hello,
We are developing a tool for automatically finding performance bugs in
PostgreSQL. Our key insight is that given a pair of semantic equivalent
queries, a robust DBMS should return the same result within a similar
execution time. Significant time difference suggests a potential performance
bug in the DBMS.
We are sharing a pair of TPC-H queries that exhibit a potential performance
bug in this report:
First query:
SELECT "s_suppkey"
FROM "supplier"
WHERE s_suppkey > 100;
Second query:
SELECT "s_suppkey"
FROM "supplier"
WHERE s_suppkey > 100
GROUP BY s_suppkey;
[Actual Behavior]
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes only 17 millisecond, while the second query takes 42
millisecond. We think the time difference results from different plans
selected.
[Query Execution Plan]
First query:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on supplier (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..14.898 rows=49900 loops=1)
Filter: (s_suppkey > 100)
Rows Removed by Filter: 100
Planning Time: 0.639 ms
Execution Time: 17.469 ms
(5 rows)
Second query:
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1858.77..2357.86 rows=49909 width=4) (actual
time=30.093..38.541 rows=49900 loops=1)
Group Key: s_suppkey
-> Seq Scan on supplier (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..13.656 rows=49900 loops=1)
Filter: (s_suppkey > 100)
Rows Removed by Filter: 100
Planning Time: 0.669 ms
Execution Time: 42.270 ms
(7 rows)
[Expected Behavior]
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL will return the same results in roughly the same amount of
time.
[Test Environment]
Ubuntu 19.10
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)
[Steps for reproducing our observations]
* Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
* Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
echo $i
name=`echo $i|cut -d'.' -f1`
psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
*Execute the queries
From | Date | Subject | |
---|---|---|---|
Next Message | Christophe Pettus | 2020-09-19 15:57:24 | Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function |
Previous Message | Tom Lane | 2020-09-18 21:22:30 | Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification |