From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | significant slowdown of HashAggregate between 9.6 and 10 |
Date: | 2020-06-03 15:32:47 |
Message-ID: | CAFj8pRDLVakD5Aagt3yZeEQeTeEWaS3YE5h8XC3Q3qJ6TYkc2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
One czech Postgres user reported performance issue related to speed
HashAggregate in nested loop.
The speed of 9.6
HashAggregate (cost=27586.10..27728.66 rows=14256 width=24)
(actual time=0.003..0.049 rows=39 loops=599203)
The speed of 10.7
HashAggregate (cost=27336.78..27552.78 rows=21600 width=24)
(actual time=0.011..0.156 rows=38 loops=597137)
So it looks so HashAgg is about 3x slower - with brutal nested loop it is a
problem.
I wrote simple benchmark and really looks so our hash aggregate is slower
and slower.
create table foo(a int, b int, c int, d int, e int, f int);
insert into foo select random()*1000, random()*4, random()*4, random()* 2,
random()*100, random()*100 from generate_series(1,2000000);
analyze foo;
9.6.7
postgres=# explain (analyze, buffers) select i from
generate_series(1,500000) g(i) where exists (select count(*) cx from foo
group by b, c, d having count(*) = i);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500
width=4) (actual time=807.485..3364.515 rows=74 loops=1) │
│ Filter: (SubPlan 1)
│
│ Rows Removed by Filter: 499926
│
│ Buffers: shared hit=12739, temp read=856 written=855
│
│ SubPlan 1
│
│ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) (actual
time=0.006..0.006 rows=0 loops=500000) │
│ Group Key: foo.b, foo.c, foo.d
│
│ Filter: (count(*) = g.i)
│
│ Rows Removed by Filter: 75
│
│ Buffers: shared hit=12739
│
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000
width=12) (actual time=0.015..139.736 rows=2000000 loops=1) │
│ Buffers: shared hit=12739
│
│ Planning time: 0.276 ms
│
│ Execution time: 3365.758 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)
10.9
postgres=# explain (analyze, buffers) select i from
generate_series(1,500000) g(i) where exists (select count(*) cx from foo
group by b, c, d having count(*) = i);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500
width=4) (actual time=825.468..4919.063 rows=74 loops=1) │
│ Filter: (SubPlan 1)
│
│ Rows Removed by Filter: 499926
│
│ Buffers: shared hit=12739, temp read=856 written=855
│
│ SubPlan 1
│
│ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) (actual
time=0.009..0.009 rows=0 loops=500000) │
│ Group Key: foo.b, foo.c, foo.d
│
│ Filter: (count(*) = g.i)
│
│ Rows Removed by Filter: 75
│
│ Buffers: shared hit=12739
│
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000
width=12) (actual time=0.025..157.887 rows=2000000 loops=1) │
│ Buffers: shared hit=12739
│
│ Planning time: 0.829 ms
│
│ Execution time: 4920.800 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)
master
postgres=# explain (analyze, buffers) select i from
generate_series(1,500000) g(i) where exists (select count(*) cx from foo
group by b, c, d having count(*) = i);
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
│ Function Scan on generate_series g (cost=0.00..28869973750.00
rows=250000 width=4) (actual time=901.639..6057.943 rows=74 loops=1)
│ Filter: (SubPlan 1)
│ Rows Removed by Filter: 499926
│ Buffers: shared hit=12739, temp read=855 written=855
│ SubPlan 1
│ -> HashAggregate (cost=57739.00..57739.94 rows=1 width=20) (actual
time=0.012..0.012 rows=0 loops=500000)
│ Group Key: foo.b, foo.c, foo.d
│ Filter: (count(*) = g.i)
│ Peak Memory Usage: 37 kB
│ Rows Removed by Filter: 75
│ Buffers: shared hit=12739
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000
width=12) (actual time=0.017..262.497 rows=2000000 loops=1)
│ Buffers: shared hit=12739
│ Planning Time: 0.275 ms
│ Buffers: shared hit=1
│ Execution Time: 6059.266 ms
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(16 rows)
Regards
Pavel
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira | 2020-06-03 15:40:38 | Re: REINDEX CONCURRENTLY and indisreplident |
Previous Message | Robert Haas | 2020-06-03 15:30:42 | Re: Why is pq_begintypsend so slow? |