From: | Zhenghua Lyu <zlyu(at)vmware(dot)com> |
---|---|
To: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0? |
Date: | 2020-10-26 08:42:52 |
Message-ID: | SN6PR05MB45594D08A1A1DF63ECA6CC2BB5190@SN6PR05MB4559.namprd05.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
It seems the function `get_variable_numdistinct` ignore the case when stanullfrac is 1.0:
# create table t(a int, b int);
CREATE TABLE
# insert into t select i from generate_series(1, 10000)i;
INSERT 0 10000
gpadmin=# analyze t;
ANALYZE
# explain analyze select b, count(1) from t group by b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
HashAggregate (cost=195.00..197.00 rows=200 width=12) (actual time=5.928..5.930 rows=1 loops=1)
Group Key: b
Batches: 1 Memory Usage: 40kB
-> Seq Scan on t (cost=0.00..145.00 rows=10000 width=4) (actual time=0.018..1.747 rows=10000 loops=1)
Planning Time: 0.237 ms
Execution Time: 5.983 ms
(6 rows)
So it gives the estimate using the default value: 200.
I have added some lines of code to take `stanullfrac ==1.0` into account. With the patch attached, we now get:
# explain analyze select b, count(1) from t group by b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
HashAggregate (cost=195.00..195.01 rows=1 width=12) (actual time=6.163..6.164 rows=1 loops=1)
Group Key: b
Batches: 1 Memory Usage: 24kB
-> Seq Scan on t (cost=0.00..145.00 rows=10000 width=4) (actual time=0.024..1.823 rows=10000 loops=1)
Planning Time: 0.535 ms
Execution Time: 6.344 ms
(6 rows)
I am not sure if this change is valuable in practical env, but it should go in the correct direction.
Any comments on this are appreciated.
Attachment | Content-Type | Size |
---|---|---|
0001-Consider-the-case-when-stanullfrac-is-1.0-in-get_var.patch | application/octet-stream | 850 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Borisov | 2020-10-26 08:44:58 | Re: POC: GROUP BY optimization |
Previous Message | kuroda.hayato@fujitsu.com | 2020-10-26 08:31:32 | RE: pgbench: option delaying queries till connections establishment? |