From: | Artur Zając <azajac(at)ang(dot)com(dot)pl> |
---|---|
To: | <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Bad estimates |
Date: | 2017-11-22 14:29:54 |
Message-ID: | 03b401d3639e$5d7493a0$185dbae0$@ang.com.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
We have table created like this:
CREATE TABLE xyz AS SELECT generate_series(1,10000000,1) AS gs;
Now:
db=# explain analyze select * from xyz where gs&1=1;
QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------
Seq Scan on xyz (cost=0.00..260815.38 rows=68920 width=4) (actual
time=0.044..2959.728 rows=5000000 loops=1)
Filter: ((gs & 1) = 1)
Rows Removed by Filter: 5000000
Planning time: 0.133 ms
Execution time: 3340.886 ms
(5 rows)
And after adding additional clause to WHERE:
db=# explain analyze select * from xyz where gs&1=1 and gs&2=2;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------
Seq Scan on xyz (cost=0.00..329735.50 rows=345 width=4) (actual
time=0.045..3010.430 rows=2500000 loops=1)
Filter: (((gs & 1) = 1) AND ((gs & 2) = 2))
Rows Removed by Filter: 7500000
Planning time: 0.106 ms
Execution time: 3176.355 ms
(5 rows)
And one more clause:
newrr=# explain analyze select * from xyz where gs&1=1 and gs&2=2 and
gs&4=4;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------
Seq Scan on xyz (cost=0.00..398655.62 rows=2 width=4) (actual
time=0.052..3329.422 rows=1250000 loops=1)
Filter: (((gs & 1) = 1) AND ((gs & 2) = 2) AND ((gs & 4) = 4))
Rows Removed by Filter: 8750000
Planning time: 0.119 ms
Execution time: 3415.839 ms
(5 rows)
As we can see estimates differs significally from the actual records count -
only three clauses are reducing estimated number of records from 10000000 to
2.
I noticed that each additional clause reduces the number about 200 times and
define DEFAULT_NUM_DISTINCT is responsible for this behaviur.
I think that this variable should be lower or maybe estimation using
DEFAULT_NUM_DISTTINCT should be done once per table.
Artur Zajac
From | Date | Subject | |
---|---|---|---|
Next Message | Alex Ignatov | 2017-11-22 14:44:18 | RE: Query became very slow after 9.6 -> 10 upgrade |
Previous Message | Dmitry Shalashov | 2017-11-22 14:29:20 | Re: Query became very slow after 9.6 -> 10 upgrade |