From: | Jinho Jung <jinhojun(at)usc(dot)edu> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Parallel query execution introduces performance regressions |
Date: | 2019-04-01 18:30:13 |
Message-ID: | CAF6pyKcGpbt3mgmVEZ-un3O8Y-Bo8FR-_2YTt0MF_c+-uCemKw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
We noticed that the following SQL query runs 3 times slower on the latest
version of PostgreSQL due to the “parallel query execution” feature.
Several other queries are affected by the same feature. We are planning to
report the discovered regressions in an academic paper. We would like to
get your feedback before doing so.
Here’s the time taken to execute this query on older (v9.5.16) and newer
versions (v11.2) of PostgreSQL (in milliseconds):
+-----------------------+--------+---------+---------+-----------+
| | scale1 | scale10 | scale50 | scale 300 |
+-----------------------+--------+---------+---------+-----------+
| v9.5.16 | 88 | 937 | 4721 | 27241 |
| v11.2 | 288 | 2822 | 13838 | 85081 |
+-----------------------+--------+---------+---------+-----------+
We have shared the following details below:
1) the associated query,
2) the commit that activated it,
3) our high-level analysis,
4) query execution plans in old and new versions of PostgreSQL, and
5) information on reproducing these regressions.
### QUERY
select
ref_0.ol_delivery_d as c1
from
public.order_line as ref_0
where EXISTS (
select
ref_1.i_im_id as c0
from
public.item as ref_1
where ref_0.ol_d_id <= ref_1.i_im_id
)
### COMMIT
77cd477 (Enable parallel query by default.)
We found several other queries exhibiting regressions related to this
commit.
### HIGH-LEVEL ANALYSIS
We believe that this regression is due to parallel queries being enabled by
default. Surprisingly, we found that even on a larger TPC-C database (scale
factor of 50, roughly 4GB of size), parallel scan is still slower than the
non-parallel execution plan in the old version.
### QUERY EXECUTION PLANS
[OLD version]
Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8)
(actual time=0.034..24981.895 rows=90017507 loops=1)
Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504
width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual
time=0.000..0.000 rows=1 loops=90017507)
-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4)
(actual time=0.006..0.006 rows=1 loops=1)
Planning time: 0.290 ms
Execution time: 27241.239 ms
[NEW version]
Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual
time=0.265..82355.289 rows=90017507 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431
width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)
Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93
rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)
-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4)
(actual time=0.001..0.001 rows=1 loops=90017507)
Planning Time: 0.319 ms
Execution Time: 85081.158 ms
### REPRODUCING REGRESSION
* The queries can be downloaded here:
https://gts3.org/~/jjung/tpcc/case4.tar.gz
* You can reproduce these results by using the setup described in:
https://www.postgresql.org/message-id/BN6PR07MB3409922471073F2B619A8CA4EE640%40BN6PR07MB3409.namprd07.prod.outlook.com
Best regards,
Jinho Jung
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-04-01 18:34:55 | Re: Parallel query execution introduces performance regressions |
Previous Message | ANASTACIO Tiago | 2019-04-01 13:49:44 | Re: BUG #15726: parallel queries failed ERROR: invalid name syntax CONTEXT: parallel worker |