From: | Felix Geisendörfer <felix(at)felixge(dot)de> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Extremely slow HashAggregate in simple UNION query |
Date: | 2019-08-20 15:11:58 |
Message-ID: | 9ADCDD19-555B-4DB0-B00C-1852AC361557@felixge.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi all,
today I debugged a query that was executing about 100x slower than expected, and was very surprised by what I found.
I'm posting to this list to see if this might be an issue that should be fixed in PostgreSQL itself.
Below is a simplified version of the query in question:
SET work_mem='64MB';
EXPLAIN ANALYZE
SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b
UNION
SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b;
HashAggregate (cost=80020.01..100020.01 rows=2000000 width=8) (actual time=19.349..23.123 rows=1 loops=1)
Group Key: a.a, b.b
-> Append (cost=0.01..70020.01 rows=2000000 width=8) (actual time=0.022..0.030 rows=2 loops=1)
-> Nested Loop (cost=0.01..20010.01 rows=1000000 width=8) (actual time=0.021..0.022 rows=1 loops=1)
-> Function Scan on generate_series a (cost=0.00..10.00 rows=1000 width=4) (actual time=0.014..0.014 rows=1 loops=1)
-> Function Scan on generate_series b (cost=0.00..10.00 rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
-> Nested Loop (cost=0.01..20010.01 rows=1000000 width=8) (actual time=0.006..0.007 rows=1 loops=1)
-> Function Scan on generate_series a_1 (cost=0.00..10.00 rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
-> Function Scan on generate_series b_1 (cost=0.00..10.00 rows=1000 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.101 ms
Execution Time: 45.986 ms
As you can see, it takes over 45ms (!) to execute what appear to be a very simple query. How is this possible?
Based on my debugging, I think the following is going on:
1. The query overestimates the final output rows by a factor of 2 million. [1]
2. The executor uses the bad estimate to allocate a huge hash table [2], and the increased work_mem [3] gives it enough rope to hang itself [4].
3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of the query execution instead of 45ms [5].
I'm certainly a novice when it comes to PostgreSQL internals, but I'm wondering if this could be fixed by taking a more dynamic approach for allocating HashAggregate hash tables?
Anyway, for my query using UNION ALL was acceptable, so I'm not too troubled by this. I was just really caught by surprise that bad estimates can not only cause bad query plans, but also cause good query plans to execute extremely slowly. I had never seen that before.
Best Regards
Felix
[1] My actual query had bad estimates for other reasons (GIN Index), but that's another story. The query above was of course deliberately designed to have bad estimates.
[2] nodeAgg.c: build_hash_table()
[3] A lot of queries in my application benefit from increased work_mem.
[4] execGrouping.c: nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) / entrysize));
[5] In my actual query it was even worse, only 6 out of 40ms of the total execution time were reported as being spent in the query nodes.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Lewis | 2019-08-20 16:19:05 | Re: Extremely slow HashAggregate in simple UNION query |
Previous Message | MichaelDBA | 2019-08-20 15:07:11 | Re: Erratically behaving query needs optimization |