From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | len(at)pdx(dot)edu, len(at)cs(dot)pdx(dot)edu, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Understanding histograms |
Date: | 2008-04-30 22:47:02 |
Message-ID: | 1209595622.14025.112.camel@dogma.ljc.laika.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, 2008-04-30 at 10:43 -0400, Tom Lane wrote:
> > Instead I would expect an estimate of "rows=0" for values of const
> > that are not in the MCV list and not in the histogram.
>
> Surely that's not very sane? The MCV list plus histogram generally
> don't include every value in the table. IIRC the estimate for values
> not present in the MCV list is (1 - sum(MCV frequencies)) divided by
> (n_distinct - number of MCV entries), which amounts to assuming that
> all values not present in the MCV list occur equally often. The weak
> spot of course is that the n_distinct estimate may be pretty inaccurate.
My understanding of Len's question is that, although the MCV list plus
the histogram don't include every distinct value in the general case,
they do include every value in the specific case where the histogram is
not full.
Essentially, this seems like using the histogram to extend the MCV list
such that, together, they represent all distinct values. This idea only
seems to help when the number of distinct values is greater than the
max size of MCVs, but less than the max size of MCVs plus histogram
bounds.
I'm not sure how much of a gain this is, because right now that could
be accomplished by increasing the statistics for that column (and
therefore all of your distinct values would fit in the MCV list). Also
the statistics aren't guaranteed to be perfectly up-to-date, so an
estimate of zero might be risky.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-04-30 23:17:44 | Re: Understanding histograms |
Previous Message | david | 2008-04-30 20:47:00 | Re: Postgres replication |