From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Gordon Runkle <gar(at)integrated-dynamics(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Odd statistics behaviour in 7.2 |
Date: | 2002-02-13 16:21:55 |
Message-ID: | 29027.1013617315@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Gordon Runkle <gar(at)integrated-dynamics(dot)com> writes:
> I have a table with 1,066,673 rows. The column I'm interested in has
> this distribution of values:
> tdnr_ct | ct
> ---------+--------
> 16 | 1
> 4 | 1
> 3 | 58
> 2 | 68904
> 1 | 928171
> This means that 'ct' records have 'tdnr_ct' duplicate values.
I'm confused. You mean that there is one value that appears 16 times,
one that appears 4 times, etc etc, and 928171 values that appear only
once?
> Under v7.2, it only sometimes does. I've looked at the statistics,
> thanks to what I learned from Tom and Marc's discussion, and I see that
> sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column
> gets a value of '-1' (desireable), and other times a value such as 59483
> or something.
This seems quite bizarre; given those stats it's hard to see how you
could get anything but -1 or close to it, even with a very unlucky
statistical sampling. Don't suppose you'd want to trace through the
ANALYZE code and find out why it's computing a bad value?
Alternatively, if you could send me a dump of just the ct column,
I could try to reproduce the behavior here. (CREATE TABLE foo AS
SELECT ct FROM yourtab and then pg_dump -t foo should do it.)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Ross J. Reedstrom | 2002-02-13 16:23:00 | Re: benchmarking postgres |
Previous Message | Marc G. Fournier | 2002-02-13 16:15:46 | Re: "Bug" in statistics for v7.2? |