From: | Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Cc: | ftm(dot)van(dot)vugt(at)foxi(dot)nl |
Subject: | array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion |
Date: | 2013-10-19 13:38:40 |
Message-ID: | 3839201.Nfa2RvcheX@techfox.foxi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
L.S.
Something seems wrong here.... when applying arrag_agg() on a large recordset,
above a certain size 'things fall over' and memory usage races off until the
system runs out of it:
# select version();
version
-----------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.1,
64-bit
# \! free -g
total used free shared buffers cached
Mem: 31 1 30 0 0 0
-/+ buffers/cache: 0 30
Swap: 31 0 31
====================
Create concatenate function and aggregate:
CREATE OR REPLACE FUNCTION comma_cat(text, text)
RETURNS text
LANGUAGE 'plpgsql'
IMMUTABLE
STRICT
SECURITY INVOKER
AS '
BEGIN
IF $1 = '''' THEN
RETURN $2;
ELSIF $2 = '''' THEN
RETURN $1;
ELSE
RETURN $1 || '', '' || $2;
END IF;
END;';
CREATE AGGREGATE comcat(text) ( SFUNC = comma_cat, STYPE = text );
Activate timing:
\timing
Timing is on.
Create demo data:
create temp table f as
select id, random() as value
from generate_series(1, 1e7::int) as f(id);
Time: 7036,917 ms
====================
Don't mind the 'usefulness' of the exact query below, I ran into this issue
when experimenting a bit using random().
On my system, using the comcat() aggregate is no problem regardless of the
size of the recordset:
with g as (select * from f limit 1e5)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 189,835 ms
with g as (select * from f limit 1e6)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 1815,756 ms
with g as (select * from f)
select comcat(id::text), min(value)
from g
group by g.value
having count(1) > 1;
Time: 18660,326 ms
====================
However, using the array_agg() this breaks (on my system ) on the largest set:
with g as (select * from f limit 1e5)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: 361,242 ms
with g as (select * from f limit 1e6)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: 3310,347 ms
with g as (select * from f)
select array_agg(id), min(value)
from g
group by g.value
having count(1) > 1;
Time: <none, fails>
=> the last query eats up all 32GB main memory in seconds, then starts on the
32GB swap (which obviously takes a bit longer to digest) until eventually the
child process gets killed by the oom-killer and postgresql restarts....
--
Best,
Frank.
From | Date | Subject | |
---|---|---|---|
Next Message | Valentine Gogichashvili | 2013-10-19 19:02:25 | Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion |
Previous Message | Honza Horak | 2013-10-18 11:55:09 | random() generates collisions too early |