From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Greg Stark *EXTERN*" <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-10 08:28:23 |
Message-ID: | A737B7A37273E048B164557ADEF4A58B17C7DCEF@ntex2010i.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark wrote:
>> It's also applicable for the other stats; histogram buckets constructed
>> from a 5% sample are more likely to be accurate than those constructed
>> from a 0.1% sample. Same with nullfrac. The degree of improved
>> accuracy, would, of course, require some math to determine.
>
> This "some math" is straightforward basic statistics. The 95th
> percentile confidence interval for a sample consisting of 300 samples
> from a population of a 1 million would be 5.66%. A sample consisting
> of 1000 samples would have a 95th percentile confidence interval of
> +/- 3.1%.
Doesn't all that assume a normally distributed random variable?
I don't think it can be applied to database table contents
without further analysis.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | 山田聡 | 2013-12-10 08:34:56 | Why standby.max_connections must be higher than primary.max_connections? |
Previous Message | KONDO Mitsumasa | 2013-12-10 08:03:31 | Re: Optimize kernel readahead using buffer access strategy |