From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H |
Date: | 2015-06-20 14:17:22 |
Message-ID: | CA+TgmoYtOZyfFp47KBUvL5+Q=RZJcHM+Lk7=rd6cvihfk36c5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 17, 2015 at 1:52 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped over
> a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow
> illustrates the importance of the memory-bounded hashagg patch Jeff Davis is
> working on).
Stupid question, but why not just override it using ALTER TABLE ...
ALTER COLUMN ... SET (n_distinct = ...)?
I think it's been discussed quite often on previous threads that you
need to sample an awful lot of the table to get a good estimate for
n_distinct. We could support that, but it would be expensive, and it
would have to be done again every time the table is auto-analyzed.
The above syntax supports nailing the estimate to either an exact
value or a percentage of the table, and I'm not sure why that isn't
good enough.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Ramsey | 2015-06-20 14:20:16 | Extension support for postgres_fdw |
Previous Message | Joel Jacobson | 2015-06-20 14:12:05 | Re: pg_stat_*_columns? |