From: | Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: On Distributions In 7.2.1 |
Date: | 2002-05-04 04:49:24 |
Message-ID: | 1020487765.1259.23.camel@spikey.slithery.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>On Fri, 2002-05-03 at 02:11, Tom Lane wrote:
> Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> writes:
> > However Tom's observation is still valid (in spite of my math) - all the
> > frequencies are overestimated, rather than the expected "some bigger,
> > some smaller" sort of thing.
>
> No, that makes sense. The values that get into the most-common-values
> list are only going to be ones that are significantly more common (in
> the sample) than the estimated average frequency. So if the thing makes
> a good estimate of the average frequency, you'll only see upside
> outliers in the MCV list. The relevant logic is in analyze.c:
>
>(snippage)
oh I see... I am thinking that a larger sample may reduce the likelihood
of sample values *not* being included in my MCV list. A quick ALTER ....
SET STATISTICS 500 + ANALYZE results in frequencies of around the 0.0004
area for each MCV - closer to the actual of 0.00033.
So the frequency estimation algorithm is behaving well in this case, and
appears to be coverging to the correct results (or within a sensible
neighbourhood of them...). So for this type of data (uniformly
distributed keys) one can ANALYZE with confidence...
I notice that for a resonably large number of keys + big table
conbination using 100 quantiles gives much more accurate frequencies -
I wonder if its worth a mention in the docs to the effect :
"ANALYZE with the default (10 or so) is ok for most cases, but for big
tables consider using ALTER ... SET STATISTICS 100 for commonly JOINed
or WHEREed columns"
(Next ...the logarithmic guy...)
regards
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | Jurgen Defurne | 2002-05-04 05:39:44 | Re: Foxpro |
Previous Message | Tom Lane | 2002-05-04 00:34:27 | Re: Subject: bool / vacuum full bug followup part 2 |