From: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: improving GROUP BY estimation |
Date: | 2016-02-25 23:16:05 |
Message-ID: | 03A3B58D-6A2D-44FE-9B03-E21948CFCBAA@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Feb 23, 2016, at 5:12 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> <snip>
>
> So much better. Clearly, there are cases where this will over-estimate the cardinality - for example when the values are somehow correlated.
>
I applied your patch, which caused a few regression tests to fail. Attached
is a patch that includes the necessary changes to the expected test results.
It is not hard to write test cases where your patched version overestimates
the number of rows by a very similar factor as the old code underestimates
them. My very first test, which was not specifically designed to demonstrate
this, happens to be one such example:
CREATE TABLE t (a INT, b int);
INSERT INTO t SELECT sqrt(gs)::int, gs FROM generate_series(1,10000000) gs;
ANALYZE t;
EXPLAIN SELECT a FROM t WHERE b < 1000 GROUP BY a;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=169250.21..169258.71 rows=850 width=4)
Group Key: a
-> Seq Scan on t (cost=0.00..169247.71 rows=1000 width=4)
Filter: (b < 1000)
(4 rows)
SELECT COUNT(*) FROM (SELECT a FROM t WHERE b < 1000 GROUP BY a) AS ss;
count
-------
32
(1 row)
So, it estimates 850 rows where only 32 are returned . Without applying your patch,
it estimates just 1 row where 32 are returned. That's an overestimate of roughly 26 times,
rather than an underestimate of 32 times.
As a patch review, I'd say that your patch does what you claim it does, and it applies
cleanly, and passes the regression tests with my included modifications. I think there
needs to be some discussion on the list about whether the patch is a good idea.
Mark Dilger
Attachment | Content-Type | Size |
---|---|---|
estimate-num-groups-v2.txt | text/plain | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2016-02-25 23:52:54 | Re: Support for N synchronous standby servers - take 2 |
Previous Message | Julien Rouhaud | 2016-02-25 22:55:44 | Re: create opclass documentation outdated |