Quick Links

BUG #10035: PostgreSQL nodes's estimate rows bug? when alter column set statistics 0

From:	digoal(at)126(dot)com
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	BUG #10035: PostgreSQL nodes's estimate rows bug? when alter column set statistics 0
Date:	2014-04-15 07:02:42
Message-ID:	20140415070242.15390.25651@wrigleys.postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

The following bug has been logged on the website:

Bug reference: 10035
Logged by: digoal.zhou
Email address: digoal(at)126(dot)com
PostgreSQL version: 9.3.3
Operating system: CentOS 6.4 x64
Description:

There is a bug in costsize.c?
when i set columns statistics to zero.
why planner can also compute selective in columns?

TEST :

Create a common table, do not close the column statistics.
digoal=# create table t(id int, info text);
CREATE TABLE
digoal=# insert into t select 1,'test' from generate_series(1,1000000);
INSERT 0 1000000
digoal=# analyze t;
ANALYZE
digoal=# select * from pg_stats where tablename ='t';
schemaname | tablename | attname | inherited | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs | histogra
m_bounds | correlation | most_common_elems | most_common_elem_freqs |
elem_count_histogram
-- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + + --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- --
-- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
+ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- --
public | t | id | f | 0 | 4 |
1 | {1} | {1} |
| 1 | | |
public | t | info | f | 0 | 5 |
1 | {test} | {1} |
| 1 | | |
(2 rows)

Create another table, close the column statistics.
digoal=# create table t1(id int, info text);
CREATE TABLE
digoal=# alter table t1 alter column id set statistics 0;
ALTER TABLE
digoal=# alter table t1 alter column info set statistics 0;
ALTER TABLE
digoal=# insert into t1 select 1,'test' from generate_series(1,1000000);
INSERT 0 1000000
digoal=# analyze t1;
ANALYZE
digoal=# select * from pg_stats where tablename ='t1';
schemaname | tablename | attname | inherited | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs | histogra
m_bounds | correlation | most_common_elems | most_common_elem_freqs |
elem_count_histogram
-- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + + --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- --
-- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
+ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- --
(0 rows)
After close the statistics, do not check in pg_stats column statistics.
For the two tables to create ID column of the index.
digoal=# create index idx_t_id on t(id);
digoal=# create index idx_t1_id on t1(id);
digoal=# select * from pg_stats where tablename ='t1';
schemaname | tablename | attname | inherited | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs | histogra
m_bounds | correlation | most_common_elems | most_common_elem_freqs |
elem_count_histogram
-- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + + --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- --
-- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
+ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- --
(0 rows)
Create indexes after t1 table there is still no statistical information,
this is, of course, because the PostgreSQL create indexes will not take the
initiative to generate statistics. Even to generate statistics, because
close the column statistics, so it is impossible to have the statistics.
So in pg_class table view the corresponding data block and line, the results
obtained with two tables is different also. T table data is accurate, the t1
table is not accurate. But it is also close to.
digoal=# select relname,reltuples,relpages from pg_class where relname in
('t','t1');
relname | reltuples | relpages
+ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- --
-- -- -- -- -- --
t | 1e+06 | 1345
t1 | 999468 | 1345
(2 rows)

Query ID is equal to 1, view the corresponding implementation plan, t1 watch
the index, but evaluation of line is 4997.
digoal=# explain select * from t1 where id=1;
QUERY PLAN
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Index Scan using idx_t1_id on t1 (cost=0.30..1436.75 rows=4997 width=36)
Index Cond: (id = 1)
(2 rows)
T table not go index, evaluation of line is 1000000 how difference so big?
digoal=# explain select * from t where id=1;
QUERY PLAN
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- --
Seq Scan on t (cost=0.00..13845.00 rows=1000000 width=9)
Filter: (id = 1)
(2 rows)
T1 tables should be no column statistics in theory, should evaluate no line,
but it is evaluated in 4997 rows.
This evaluation algorithm in costsize. C can be found, in this case is
obviously not reasonable.

selective estimate code at cost_index(at)costsize(dot)c
/*
* Call index-access-method-specific code to estimate the processing
cost
* for scanning the index, as well as the selectivity of the index
(ie,
* the fraction of main-table tuples we will have to retrieve) and
its
* correlation to the main-table tuple order.
*/
OidFunctionCall7(index->amcostestimate,
PointerGetDatum(root),
PointerGetDatum(path),
Float8GetDatum(loop_count),

PointerGetDatum(&indexStartupCost),
PointerGetDatum(&indexTotalCost),

PointerGetDatum(&indexSelectivity),

PointerGetDatum(&indexCorrelation));

/*
* Save amcostestimate's results for possible use in bitmap scan
planning.
* We don't bother to save indexStartupCost or indexCorrelation,
because a
* bitmap scan doesn't care about either.
*/
path->indextotalcost = indexTotalCost;
path->indexselectivity = indexSelectivity;

/* all costs for touching index itself included here */
startup_cost += indexStartupCost;
run_cost += indexTotalCost - indexStartupCost;

/* estimate number of main-table tuples fetched */
tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-04-16 07:32:50	Re: BUG #10013: PostgreSQL 9.4 initdb FATAL: could not write to file "pg_xlog/xlogtemp.3590": No space left on devi
Previous Message	John Mudd	2014-04-15 02:48:34	Fwd: [BUGS] Debug strategy for musl Postgres?