Re: Odd statistics behaviour in 7.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gordon Runkle <gar(at)integrated-dynamics(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Odd statistics behaviour in 7.2
Date: 2002-02-13 16:21:55
Message-ID: 29027.1013617315@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gordon Runkle <gar(at)integrated-dynamics(dot)com> writes:
> I have a table with 1,066,673 rows. The column I'm interested in has
> this distribution of values:

> tdnr_ct | ct
> ---------+--------
> 16 | 1
> 4 | 1
> 3 | 58
> 2 | 68904
> 1 | 928171

> This means that 'ct' records have 'tdnr_ct' duplicate values.

I'm confused. You mean that there is one value that appears 16 times,
one that appears 4 times, etc etc, and 928171 values that appear only
once?

> Under v7.2, it only sometimes does. I've looked at the statistics,
> thanks to what I learned from Tom and Marc's discussion, and I see that
> sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column
> gets a value of '-1' (desireable), and other times a value such as 59483
> or something.

This seems quite bizarre; given those stats it's hard to see how you
could get anything but -1 or close to it, even with a very unlucky
statistical sampling. Don't suppose you'd want to trace through the
ANALYZE code and find out why it's computing a bad value?

Alternatively, if you could send me a dump of just the ct column,
I could try to reproduce the behavior here. (CREATE TABLE foo AS
SELECT ct FROM yourtab and then pg_dump -t foo should do it.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ross J. Reedstrom 2002-02-13 16:23:00 Re: benchmarking postgres
Previous Message Marc G. Fournier 2002-02-13 16:15:46 Re: "Bug" in statistics for v7.2?