From: | Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | More On 7.2 Distributions - Estimates For Number Distinct < 0 |
Date: | 2001-10-29 09:18:38 |
Message-ID: | 01102922183801.04563@spikey.slithery.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
In the process of attempting to understand the data in pg_stats, I created a
(very) simple example :
CREATE TABLE test(id integer);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(3);
INSERT INTO test VALUES(4);
INSERT INTO test VALUES(5);
ANALYZE test;
SELECT * FROM pg_stats WHERE tablename='test';
tablename test
attname id
null_frac 0
avg_width 4
n_distinct -0.5
most_common_vals {"1","2"}
most_common_vals {"0.4","0.3"}
histogram_bounds {"3","4","5"}
correlation 1
everything looks good except for n_distinct ( its negative - should be 5)
(I wasn't too worried about avg_width )
Using fairly crude tracing (adding elog calls) in
src/backend/commands/analyze.c :
DEBUG: Analyzing test
DEBUG: Analyze : beginning a column
DEBUG: Have 10 total values in relation (totalrows)
DEBUG: Have 10 values in relation (numrows)
DEBUG: Have 10 values in sample (values_cnt)
DEBUG: Have 5 distinct values in sample (ndistinct)
DEBUG: Have 2 multiple values in sample (nmultiple)
DEBUG: calc 5.000000 distinct via Chaudhuri rule
DEBUG: calc -0.500000 distinct via >10 percent rowcount rule
So we had the correct answer before applying the 10 percent rowcount code.
This 10 percent rowcount code being line 1340 or thereabouts :
if (stats->stadistinct > 0.1 * totalrows)
{
stats->stadistinct = -(stats->stadistinct / totalrows);
}
My example is pretty contrived, but I wonder if I have "stumbled" on a bug
here.
regards
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Hallam | 2001-10-29 09:58:04 | SELECT with backslash '\' character |
Previous Message | Mark kirkwood | 2001-10-29 08:53:56 | Re: On Distributions In 7.2 |