BUG #16624: Query Optimizer - Performance bug related to predicate simplification

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: XINYULIU(at)UMICH(dot)EDU
Subject: BUG #16624: Query Optimizer - Performance bug related to predicate simplification
Date: 2020-09-18 16:14:17
Message-ID: 16624-1375434d97659203@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16624
Logged by: XINYU LIU
Email address: XINYULIU(at)UMICH(dot)EDU
PostgreSQL version: 13rc1
Operating system: Ubuntu 20.04
Description:

Hello,

We are developing a tool for automatically finding performance bugs in
PostgreSQL. Our key insight is that given a pair of semantic equivalent
queries, a robust DBMS should return the same result within a similar
execution time. Significant time difference suggests a potential performance
bug in the DBMS.

We are sharing a pair of TPC-H queries that exhibit a potential performance
bug in this report:

First query:
SELECT "ps_suppkey"
FROM "partsupp"
WHERE "ps_partkey" = 1486;

Second query:
SELECT "ps_suppkey"
FROM "partsupp"
WHERE "ps_partkey" + 1486 = 2972;

[Actual Behavior]
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes only 1.059 millisecond, while the second query takes 247.176
millisecond. We think the time difference results from different plans
selected.

[Query Execution Plan]
* First query:
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using partsupp_pkey on partsupp (cost=0.43..4.59 rows=9
width=4) (actual time=0.692..0.694 rows=4 loops=1)
Index Cond: (ps_partkey = 1486)
Heap Fetches: 0
Planning Time: 4.748 ms
Execution Time: 1.059 ms
(5 rows)

* Second query:
QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.43..91865.94 rows=19994 width=4) (actual
time=2.032..246.821 rows=4 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using partsupp_pkey on partsupp
(cost=0.43..88866.54 rows=8331 width=4) (actual time=159.371..240.012 rows=1
loops=3)
Filter: ((ps_partkey + 1486) = 2972)
Rows Removed by Filter: 1333332
Heap Fetches: 0
Planning Time: 4.556 ms
Execution Time: 247.176 ms
(9 rows)

[Expected Behavior]
I would have expected the DBMS to run these two queries with similar
execution time, given that they both have the same semantics. Notably, the
execution time difference between these two queries will grow significantly
when the size of the database grows. On the TPC-H benchmark of scale factor
100, the first query takes 1.9 millisecond, while the second query takes 83
seconds.

[Test Environment]
Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23
00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
PostgreSQL v13 beta3
Database: TPC-H benchmark (with scale factor 5)

[Steps for reproducing our observations]

* Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
* Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
echo $i
name=`echo $i|cut -d'.' -f1`
psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
* Execute the queries

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-09-18 21:22:30 Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification
Previous Message Tom Lane 2020-09-18 13:52:46 Re: BUG #16623: JSON select query result is getting differed when we change DB version