From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Casey Duncan <casey(at)pandora(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Out of memory error during large hashagg |
Date: | 2006-09-19 08:51:41 |
Message-ID: | 1158655901.2586.40.camel@holly |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, 2006-09-18 at 14:08 -0700, Casey Duncan wrote:
> I've reported variants of this in the past, but this case is entirely
> repeatable.
>
> Executing this query:
>
> select st_id, min(seed_id) as "initial_seed_id", count(*) as
> "seed_count"
> from seed group by st_id;
>
> The query plan and table stats are:
>
> QUERY PLAN
> -----------------------------------------------------------------------
> HashAggregate (cost=1362694.83..1365164.68 rows=164656 width=16)
> -> Seq Scan on seed (cost=0.00..964065.62 rows=53150562 width=16)
>
>
> relname | relpages | reltuples
> ---------+----------+-------------
> seed | 428880 | 5.26984e+07
>
The hashagg doesn't yet have scroll to disk capability, so a bad
estimation of ndistinct will cause this to fail (at any release). This
is a known issue for certain distributions of data only. The workaround
is the one you know about already: enable_hashagg = off
I'm interested in collecting info on the distribution of data.
Can you post:
select tablename, attname, n_distinct from pg_stats
where attname = 'st_id';
select count(distinct st_id) from seed;
and also the table definition, including the PK
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bharat Manmode | 2006-09-19 10:23:52 | Startup Error |
Previous Message | Jackie Leng | 2006-09-19 08:38:03 | Optimizer Bug of Debug Version |