| From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> | 
|---|---|
| To: | "Greg Stark *EXTERN*" <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com> | 
| Cc: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: ANALYZE sampling is too good | 
| Date: | 2013-12-10 08:28:23 | 
| Message-ID: | A737B7A37273E048B164557ADEF4A58B17C7DCEF@ntex2010i.host.magwien.gv.at | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Greg Stark wrote:
>> It's also applicable for the other stats; histogram buckets constructed
>> from a 5% sample are more likely to be accurate than those constructed
>> from a 0.1% sample.   Same with nullfrac.  The degree of improved
>> accuracy, would, of course, require some math to determine.
> 
> This "some math" is straightforward basic statistics.  The 95th
> percentile confidence interval for a sample consisting of 300 samples
> from a population of a 1 million would be 5.66%. A sample consisting
> of 1000 samples would have a 95th percentile confidence interval of
> +/- 3.1%.
Doesn't all that assume a normally distributed random variable?
I don't think it can be applied to database table contents
without further analysis.
Yours,
Laurenz Albe
| From | Date | Subject | |
|---|---|---|---|
| Next Message | 山田聡 | 2013-12-10 08:34:56 | Why standby.max_connections must be higher than primary.max_connections? | 
| Previous Message | KONDO Mitsumasa | 2013-12-10 08:03:31 | Re: Optimize kernel readahead using buffer access strategy |