From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: Bug in huge simplehash |
Date: | 2021-08-13 04:50:50 |
Message-ID: | CAApHDvp42UGXN+BhZx3gONfFq=OpOTByUX=9pwVrYG8hXEZAow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 11 Aug 2021 at 00:10, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> Attached v2.
Eyeballing this it looks fine, but I was a little nervous backpatching
without testing it properly, so I ended up provisioning a machine with
256GB and doing a round of testing.
I just created the most simple table I could:
create table a (a bigserial, b int);
and inserted 2^31 rows.
insert into a (b) values(1);
insert into a (b) select b from a; -- repeated until I got 2^31 rows.
set work_mem = '256GB';
set max_parallel_workers_per_gather = 0;
I could recreate the issue described with the following query:
explain (analyze , timing off) select a from a group by a;
After watching perf top for a while it switched to:
98.90% postgres [.] tuplehash_grow
0.36% [kernel] [k] change_p4d_range
0.24% postgres [.] LookupTupleHashEntry
0.09% postgres [.] tts_minimal_store_tuple
0.07% [kernel] [k] vm_normal_page
0.02% [kernel] [k] __softirqentry_text_start
0.02% postgres [.] heap_fill_tuple
0.02% postgres [.] AllocSetAlloc
After patching I got:
explain (analyze , timing off) select a from a group by a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
HashAggregate (cost=35149810.71..53983243.28 rows=1883343257
width=8) (actual rows=2147483648 loops=1)
Group Key: a
Batches: 1 Memory Usage: 201334801kB
-> Seq Scan on a (cost=0.00..30441452.57 rows=1883343257 width=8)
(actual rows=2147483648 loops=1)
Planning Time: 0.105 ms
Execution Time: 2173480.905 ms
(6 rows)
Time: 2173482.166 ms (36:13.482)
And, since I only had 256GB of memory on this machine and couldn't
really do 2^32 groups, I dropped SH_FILLFACTOR to 0.4 and
SH_MAX_FILLFACTOR to 0.48 and tried again to ensure I got the hash
table full message:
postgres=# explain (analyze on , timing off) select a from a group by a;
ERROR: hash table size exceeded
Time: 1148554.672 ms (19:08.555)
After doing that, I felt a bit better about batch-patching it, so I did.
Thanks for the patch.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2021-08-13 05:14:56 | Re: Default to TIMESTAMP WITH TIME ZONE? |
Previous Message | Amit Kapila | 2021-08-13 04:06:35 | Re: Skipping logical replication transactions on subscriber side |