Re: Big performance slowdown from 11.2 to 13.3

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Big performance slowdown from 11.2 to 13.3
Date: 2021-07-23 14:45:01
Message-ID: CAM+6J95CSOECcJekYKnmZ7tvKOLch+aXt8A9z1BTxkrP=6rjSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 23 Jul 2021 at 03:06, ldh(at)laurent-hasson(dot)com <ldh(at)laurent-hasson(dot)com>
wrote:

> I am not sure I understand this parameter well enough but it’s with a
> default value right now of 1000. I have read Robert’s post (
> http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html)
> and could play with those parameters, but unsure whether what you are
> describing will unlock this 2GB limit.
>
>
>

Yeah, may be i was diverting, and possibly cannot use the windows
bottleneck.

although the query is diff, the steps were
1) use system default, work_mem = 4MB, parallel_setup_cost = 1000
-- runs the query in parallel, no disk spill as work_mem suff.for my data

postgres=# explain analyze with cte as (select month_name, day_name,
year_actual, max(date) date from dimensions.dates group by year_actual,
month_name, day_name) select max(date),year_actual from cte group by
year_actual;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=931227.78..932398.85 rows=200 width=8) (actual
time=7850.214..7855.848 rows=51 loops=1)
Group Key: dates.year_actual
-> Finalize GroupAggregate (cost=931227.78..932333.85 rows=4200
width=28) (actual time=7850.075..7855.611 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name, dates.day_name
-> Gather Merge (cost=931227.78..932207.85 rows=8400 width=28)
(actual time=7850.069..7854.008 rows=11295 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=930227.76..930238.26 rows=4200 width=28)
(actual time=7846.419..7846.551 rows=3765 loops=3)
Sort Key: dates.year_actual, dates.month_name,
dates.day_name
Sort Method: quicksort Memory: 391kB
Worker 0: Sort Method: quicksort Memory: 392kB
Worker 1: Sort Method: quicksort Memory: 389kB
-> Partial HashAggregate (cost=929933.00..929975.00
rows=4200 width=28) (actual time=7841.979..7842.531 rows=3765 loops=3)
Group Key: dates.year_actual, dates.month_name,
dates.day_name
Batches: 1 Memory Usage: 721kB
Worker 0: Batches: 1 Memory Usage: 721kB
Worker 1: Batches: 1 Memory Usage: 721kB
-> Parallel Seq Scan on dates
(cost=0.00..820355.00 rows=10957800 width=28) (actual time=3.347..4779.784
rows=8766240 loops=3)
Planning Time: 0.133 ms
Execution Time: 7855.958 ms
(20 rows)

-- set work_mem to a very low value, to spill to disk and compare the
spill in parallel vs serial
postgres=# set work_mem TO 64; --
SET
postgres=# explain analyze with cte as (select month_name, day_name,
year_actual, max(date) date from dimensions.dates group by year_actual,
month_name, day_name) select max(date),year_actual from cte group by
year_actual;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=2867778.00..2868949.07 rows=200 width=8) (actual
time=18116.529..18156.972 rows=51 loops=1)
Group Key: dates.year_actual
-> Finalize GroupAggregate (cost=2867778.00..2868884.07 rows=4200
width=28) (actual time=18116.421..18156.729 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name, dates.day_name
-> Gather Merge (cost=2867778.00..2868758.07 rows=8400 width=28)
(actual time=18116.412..18155.136 rows=11126 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=2866777.98..2866788.48 rows=4200 width=28)
(actual time=17983.836..17984.981 rows=3709 loops=3)
Sort Key: dates.year_actual, dates.month_name,
dates.day_name
Sort Method: external merge Disk: 160kB
Worker 0: Sort Method: external merge Disk: 168kB
Worker 1: Sort Method: external merge Disk: 160kB
-> Partial HashAggregate
(cost=2566754.38..2866423.72 rows=4200 width=28) (actual
time=10957.390..17976.250 rows=3709 loops=3)
Group Key: dates.year_actual, dates.month_name,
dates.day_name
Planned Partitions: 4 Batches: 21 Memory
Usage: 93kB Disk Usage: 457480kB
Worker 0: Batches: 21 Memory Usage: 93kB Disk
Usage: *473056kB*
Worker 1: Batches: 21 Memory Usage: 93kB Disk
Usage: *456792kB*
-> Parallel Seq Scan on dates
(cost=0.00..820355.00 rows=10957800 width=28) (actual time=1.042..5893.803
rows=8766240 loops=3)
Planning Time: 0.142 ms
Execution Time: 18195.973 ms
(20 rows)

postgres=# set parallel_setup_cost TO 1000000000; -- make sure it never
uses parallel, check disk spill (much more than when parallel workers used)
SET
postgres=# explain analyze with cte as (select month_name, day_name,
year_actual, max(date) date from dimensions.dates group by year_actual,
month_name, day_name) select max(date),year_actual from cte group by
year_actual;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=5884624.58..5884658.08 rows=200 width=8) (actual
time=35462.340..35463.142 rows=51 loops=1)
Group Key: cte.year_actual
-> Sort (cost=5884624.58..5884635.08 rows=4200 width=8) (actual
time=35462.325..35462.752 rows=4201 loops=1)
Sort Key: cte.year_actual
Sort Method: external merge Disk: 80kB
-> Subquery Scan on cte (cost=5165122.70..5884312.33 rows=4200
width=8) (actual time=21747.139..35461.371 rows=4201 loops=1)
-> HashAggregate (cost=5165122.70..5884270.33 rows=4200
width=28) (actual time=21747.138..35461.140 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name,
dates.day_name
Planned Partitions: 4 Batches: 21 Memory Usage:
93kB Disk Usage: *1393192kB*
-> Seq Scan on dates (cost=0.00..973764.20
rows=26298720 width=28) (actual time=0.005..10698.392 rows=26298721 loops=1)
Planning Time: 0.124 ms
Execution Time: 35548.514 ms
(12 rows)

I was thinking trying to make the query run in parallel, would reduce disk
io per worker, and maybe speed up aggregate, especially if ti runs around 1
hours.
ofcourse, this was just trying things, maybe i am trying to override
optimizer, but just wanted to understand cost diff and resource by forcing
custom plans.

i also tried with enable_sort to off, enable_hashag to off <it only got
worse, so not sharing as it would deviate the thread>.

again, ignore, if it does not make sense :)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message ldh@laurent-hasson.com 2021-07-23 17:17:48 RE: Big performance slowdown from 11.2 to 13.3
Previous Message ldh@laurent-hasson.com 2021-07-22 21:36:04 RE: Big performance slowdown from 11.2 to 13.3