| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "Gordon A(dot) Runkle" <gar(at)integrated-dynamics(dot)com> |
| Cc: | pgsql-hackers(at)postgreSQL(dot)org |
| Subject: | Re: Odd statistics behaviour in 7.2 |
| Date: | 2002-02-16 17:57:19 |
| Message-ID: | 20368.1013882239@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
BTW, while we're thinking about this, there's another aspect of the
number-of-distinct-values estimator that could use some peer review.
That's the decision whether to assume that the number of distinct
values in a column is fixed, or will vary with the size of the
table. (For example, in a boolean column, ndistinct should clearly
be 2 no matter how large the table gets; but in any unique column
ndistinct should equal the table size.) This is important since there
are times when we update the table size estimate (pg_class.reltuples)
without recomputing the statistics in pg_statistic. The "negative
stadistinct" convention in pg_statistic is used to signal which case
ANALYZE thinks applies.
Presently the decision is pretty simplistic: if the estimated number
of distinct values is more than 10% of the number of rows, then assume
the number of distinct values scales with the number of rows.
I believe that some rule of this form is reasonable, but the 10%
threshold was just picked out of the air. Can anyone suggest an
argument in favor of some other value, or a better way to look at it?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2002-02-16 19:03:43 | Re: 7.2 and current timestamp bug? |
| Previous Message | Tom Lane | 2002-02-16 17:17:33 | Re: Odd statistics behaviour in 7.2 |