From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H |
Date: | 2015-06-19 18:32:53 |
Message-ID: | CAMkU=1xWa5GvE3tsrcUsWCQaZ_mSjLtKGEAEbee5VbLFa47SCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:
> Hi,
>
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped
> over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which
> somehow illustrates the importance of the memory-bounded hashagg patch Jeff
> Davis is working on).
>
> The problem is Q18, particularly this simple subquery:
>
> select l_orderkey
> from lineitem
> group by l_orderkey
> having sum(l_quantity) > 313;
>
> which is planned like this:
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------
> HashAggregate (cost=598510163.92..598515393.93 rows=418401 width=12)
> Group Key: l_orderkey
> Filter: (sum(l_quantity) > '313'::double precision)
> -> Seq Scan on lineitem (cost=0.00..508509923.28 rows=18000048128
> width=12)
> (4 rows)
>
> but sadly, in reality the l_orderkey cardinality looks like this:
>
> tpch=# select count(distinct l_orderkey) from lineitem;
> count
> ------------
> 4500000000
> (1 row)
>
> That's a helluva difference - not the usual one or two orders of
> magnitude, but 10000x underestimate.
>
Is the row order in the table correlated with the value l_orderkey?
Could you create copy of the table ordered at random, and see if it
exhibits the same estimation issue?
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-06-19 18:51:53 | Re: pgbench - allow backslash-continuations in custom scripts |
Previous Message | Josh Berkus | 2015-06-19 18:32:13 | Re: pgbench - allow backslash-continuations in custom scripts |