From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Arne Scheffer <arne(dot)scheffer(at)uni-muenster(dot)de>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Add min and max execute statement time in pg_stat_statement |
Date: | 2015-01-21 14:46:53 |
Message-ID: | 54BFBBDD.8080302@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/21/2015 09:27 AM, Arne Scheffer wrote:
> Sorry, corrected second try because of copy&paste mistakes:
> VlG-Arne
>
>> Comments appreciated.
>> Definition var_samp = Sum of squared differences /n-1
>> Definition stddev_samp = sqrt(var_samp)
>> Example N=4
>> 1.) Sum of squared differences
>> 1_4Sum(Xi-XM4)²
>> =
>> 2.) adding nothing
>> 1_4Sum(Xi-XM4)²
>> +0
>> +0
>> +0
>> =
>> 3.) nothing changed
>> 1_4Sum(Xi-XM4)²
>> +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
>> +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
>> +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
>> =
>> 4.) parts reordered
>> (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
>> +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
>> +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
>> +1_1Sum(X1-XM1)²
>> =
>> 5.)
>> (X4-XM4)(X4-XM3)
>> + (X3-XM3)(X3-XM2)
>> + (X2-XM2)(X2-XM1)
>> + (X1-XM1)²
>> =
>> 6.) XM1=X1 => There it is - The iteration part of Welfords Algorithm
>> (in
>> reverse order)
>> (X4-XM4)(X4-XM3)
>> + (X3-XM3)(X3-XM2)
>> + (X2-XM2)(X2-X1)
>> + 0
>> The missing piece is 4.) to 5.)
>> it's algebra, look at e.g.:
>> http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/
>
>
I have no idea what you are saying here.
Here are comments in email to me from the author of
<http://www.johndcook.com/blog/standard_deviation> regarding the divisor
used:
My code is using the unbiased form of the sample variance, dividing
by n-1.
It's usually not worthwhile to make a distinction between a sample
and a population because the "population" is often itself a sample.
For example, if you could measure the height of everyone on earth at
one instance, that's the entire population, but it's still a sample
from all who have lived and who ever will live.
Also, for large n, there's hardly any difference between 1/n and
1/(n-1).
Maybe I should add that in the code comments. Otherwise, I don't think
we need a change.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-01-21 15:21:27 | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |
Previous Message | Arne Scheffer | 2015-01-21 14:27:03 | Re: Add min and max execute statement time in pg_stat_statement |